{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Tce3stUlHN0L"
   },
   "source": [
    "##### Copyright 2024 Google LLC."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "cellView": "form",
    "id": "tuOe1ymfHZPu"
   },
   "outputs": [],
   "source": [
    "# @title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
    "# you may not use this file except in compliance with the License.\n",
    "# You may obtain a copy of the License at\n",
    "#\n",
    "# https://www.apache.org/licenses/LICENSE-2.0\n",
    "#\n",
    "# Unless required by applicable law or agreed to in writing, software\n",
    "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
    "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
    "# See the License for the specific language governing permissions and\n",
    "# limitations under the License."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "dfsDR_omdNea"
   },
   "source": [
    "# Gemma - finetune with XTuner\n",
    "\n",
    "This notebook demonstrates how to finetune Gemma with XTuner. [XTuner](https://github.com/InternLM/xtuner) is an efficient, flexible and full-featured toolkit for fine-tuning LLM. XTuner wraps the Hugging Face finetuning functionality and provides a simple interface for finetuning. It's very easy to finetune Gemma with XTuner.\n",
    "\n",
    "<table align=\"left\">\n",
    "  <td>\n",
    "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/google-gemini/gemma-cookbook/blob/main/Gemma/Finetune_with_XTuner.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
    "  </td>\n",
    "</table>"
   ]
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Setup\n",
    "\n",
    "### Select the Colab runtime\n",
    "To complete this tutorial, you'll need to have a Colab runtime with sufficient resources to run the Gemma model. In this case, you can use a T4 GPU:\n",
    "\n",
    "1. In the upper-right of the Colab window, select **▾ (Additional connection options)**.\n",
    "2. Select **Change runtime type**.\n",
    "3. Under **Hardware accelerator**, select **T4 GPU**.\n",
    "\n",
    "\n",
    "### Gemma setup on Hugging Face\n",
    "XTuner uses Hugging Face under the hood. So you will need to:\n",
    "\n",
    "* Get access to Gemma on [huggingface.co](huggingface.co) by accepting the Gemma license on the Hugging Face page of the specific model, i.e., [Gemma 2B](https://huggingface.co/google/gemma-2b).\n",
    "* Generate a [Hugging Face access token](https://huggingface.co/docs/hub/en/security-tokens) and configure it as a Colab secret 'HF_TOKEN'."
   ],
   "metadata": {
    "id": "MwMiP7jDdAL1"
   }
  },
  {
   "cell_type": "code",
   "source": [
    "import os\n",
    "from google.colab import userdata\n",
    "# Note: `userdata.get` is a Colab API. If you're not using Colab, set the env\n",
    "# vars as appropriate for your system.\n",
    "os.environ[\"HF_TOKEN\"] = userdata.get(\"HF_TOKEN\")"
   ],
   "metadata": {
    "id": "AVvJYwne3hha"
   },
   "execution_count": 2,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Install XTuner"
   ],
   "metadata": {
    "id": "8yUF4Hk5dOoz"
   }
  },
  {
   "cell_type": "code",
   "source": [
    "!pip install -U 'xtuner'"
   ],
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "4pY14h6_bDrr",
    "outputId": "fe4d1b4e-e09d-4716-87a5-d1394c4cce25"
   },
   "execution_count": 3,
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "Collecting xtuner\n",
      "  Downloading xtuner-0.1.19-py3-none-any.whl (1.4 MB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.4/1.4 MB\u001b[0m \u001b[31m6.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting bitsandbytes>=0.40.0.post4 (from xtuner)\n",
      "  Downloading bitsandbytes-0.43.1-py3-none-manylinux_2_24_x86_64.whl (119.8 MB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m119.8/119.8 MB\u001b[0m \u001b[31m12.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting datasets>=2.16.0 (from xtuner)\n",
      "  Downloading datasets-2.19.1-py3-none-any.whl (542 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m542.0/542.0 kB\u001b[0m \u001b[31m40.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting einops (from xtuner)\n",
      "  Downloading einops-0.8.0-py3-none-any.whl (43 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m43.2/43.2 kB\u001b[0m \u001b[31m6.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting lagent>=0.1.2 (from xtuner)\n",
      "  Downloading lagent-0.2.2-py3-none-any.whl (69 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m69.7/69.7 kB\u001b[0m \u001b[31m9.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting mmengine>=0.10.3 (from xtuner)\n",
      "  Downloading mmengine-0.10.4-py3-none-any.whl (451 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m451.7/451.7 kB\u001b[0m \u001b[31m43.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: openpyxl in /usr/local/lib/python3.10/dist-packages (from xtuner) (3.1.3)\n",
      "Collecting peft>=0.4.0 (from xtuner)\n",
      "  Downloading peft-0.11.1-py3-none-any.whl (251 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m251.6/251.6 kB\u001b[0m \u001b[31m33.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: scikit-image in /usr/local/lib/python3.10/dist-packages (from xtuner) (0.19.3)\n",
      "Requirement already satisfied: scipy in /usr/local/lib/python3.10/dist-packages (from xtuner) (1.11.4)\n",
      "Requirement already satisfied: SentencePiece in /usr/local/lib/python3.10/dist-packages (from xtuner) (0.1.99)\n",
      "Collecting tiktoken (from xtuner)\n",
      "  Downloading tiktoken-0.7.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.1/1.1 MB\u001b[0m \u001b[31m58.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: torch in /usr/local/lib/python3.10/dist-packages (from xtuner) (2.3.0+cu121)\n",
      "Requirement already satisfied: torchvision in /usr/local/lib/python3.10/dist-packages (from xtuner) (0.18.0+cu121)\n",
      "Requirement already satisfied: transformers!=4.38.0,!=4.38.1,!=4.38.2,>=4.36.0 in /usr/local/lib/python3.10/dist-packages (from xtuner) (4.41.1)\n",
      "Collecting transformers-stream-generator (from xtuner)\n",
      "  Downloading transformers-stream-generator-0.0.5.tar.gz (13 kB)\n",
      "  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
      "Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from bitsandbytes>=0.40.0.post4->xtuner) (1.25.2)\n",
      "Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from datasets>=2.16.0->xtuner) (3.14.0)\n",
      "Requirement already satisfied: pyarrow>=12.0.0 in /usr/local/lib/python3.10/dist-packages (from datasets>=2.16.0->xtuner) (14.0.2)\n",
      "Requirement already satisfied: pyarrow-hotfix in /usr/local/lib/python3.10/dist-packages (from datasets>=2.16.0->xtuner) (0.6)\n",
      "Collecting dill<0.3.9,>=0.3.0 (from datasets>=2.16.0->xtuner)\n",
      "  Downloading dill-0.3.8-py3-none-any.whl (116 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m116.3/116.3 kB\u001b[0m \u001b[31m13.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from datasets>=2.16.0->xtuner) (2.0.3)\n",
      "Requirement already satisfied: requests>=2.19.0 in /usr/local/lib/python3.10/dist-packages (from datasets>=2.16.0->xtuner) (2.31.0)\n",
      "Requirement already satisfied: tqdm>=4.62.1 in /usr/local/lib/python3.10/dist-packages (from datasets>=2.16.0->xtuner) (4.66.4)\n",
      "Collecting xxhash (from datasets>=2.16.0->xtuner)\n",
      "  Downloading xxhash-3.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (194 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m194.1/194.1 kB\u001b[0m \u001b[31m14.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting multiprocess (from datasets>=2.16.0->xtuner)\n",
      "  Downloading multiprocess-0.70.16-py310-none-any.whl (134 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m134.8/134.8 kB\u001b[0m \u001b[31m13.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: fsspec[http]<=2024.3.1,>=2023.1.0 in /usr/local/lib/python3.10/dist-packages (from datasets>=2.16.0->xtuner) (2023.6.0)\n",
      "Requirement already satisfied: aiohttp in /usr/local/lib/python3.10/dist-packages (from datasets>=2.16.0->xtuner) (3.9.5)\n",
      "Requirement already satisfied: huggingface-hub>=0.21.2 in /usr/local/lib/python3.10/dist-packages (from datasets>=2.16.0->xtuner) (0.23.2)\n",
      "Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from datasets>=2.16.0->xtuner) (24.0)\n",
      "Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from datasets>=2.16.0->xtuner) (6.0.1)\n",
      "Collecting arxiv (from lagent>=0.1.2->xtuner)\n",
      "  Downloading arxiv-2.1.0-py3-none-any.whl (11 kB)\n",
      "Requirement already satisfied: distro in /usr/lib/python3/dist-packages (from lagent>=0.1.2->xtuner) (1.7.0)\n",
      "Collecting func-timeout (from lagent>=0.1.2->xtuner)\n",
      "  Downloading func_timeout-4.3.5.tar.gz (44 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m44.3/44.3 kB\u001b[0m \u001b[31m3.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
      "Collecting griffe (from lagent>=0.1.2->xtuner)\n",
      "  Downloading griffe-0.45.2-py3-none-any.whl (120 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m120.3/120.3 kB\u001b[0m \u001b[31m13.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting json5 (from lagent>=0.1.2->xtuner)\n",
      "  Downloading json5-0.9.25-py3-none-any.whl (30 kB)\n",
      "Requirement already satisfied: jsonschema in /usr/local/lib/python3.10/dist-packages (from lagent>=0.1.2->xtuner) (4.19.2)\n",
      "Collecting jupyter (from lagent>=0.1.2->xtuner)\n",
      "  Downloading jupyter-1.0.0-py2.py3-none-any.whl (2.7 kB)\n",
      "Requirement already satisfied: jupyter-client in /usr/local/lib/python3.10/dist-packages (from lagent>=0.1.2->xtuner) (6.1.12)\n",
      "Collecting phx-class-registry (from lagent>=0.1.2->xtuner)\n",
      "  Downloading phx_class_registry-4.1.0-py3-none-any.whl (13 kB)\n",
      "Collecting streamlit (from lagent>=0.1.2->xtuner)\n",
      "  Downloading streamlit-1.35.0-py2.py3-none-any.whl (8.6 MB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m8.6/8.6 MB\u001b[0m \u001b[31m70.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from lagent>=0.1.2->xtuner) (4.12.0)\n",
      "Collecting addict (from mmengine>=0.10.3->xtuner)\n",
      "  Downloading addict-2.4.0-py3-none-any.whl (3.8 kB)\n",
      "Requirement already satisfied: matplotlib in /usr/local/lib/python3.10/dist-packages (from mmengine>=0.10.3->xtuner) (3.7.1)\n",
      "Requirement already satisfied: rich in /usr/local/lib/python3.10/dist-packages (from mmengine>=0.10.3->xtuner) (13.7.1)\n",
      "Requirement already satisfied: termcolor in /usr/local/lib/python3.10/dist-packages (from mmengine>=0.10.3->xtuner) (2.4.0)\n",
      "Collecting yapf (from mmengine>=0.10.3->xtuner)\n",
      "  Downloading yapf-0.40.2-py3-none-any.whl (254 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m254.7/254.7 kB\u001b[0m \u001b[31m22.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: opencv-python>=3 in /usr/local/lib/python3.10/dist-packages (from mmengine>=0.10.3->xtuner) (4.8.0.76)\n",
      "Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from peft>=0.4.0->xtuner) (5.9.5)\n",
      "Collecting accelerate>=0.21.0 (from peft>=0.4.0->xtuner)\n",
      "  Downloading accelerate-0.30.1-py3-none-any.whl (302 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m302.6/302.6 kB\u001b[0m \u001b[31m27.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: safetensors in /usr/local/lib/python3.10/dist-packages (from peft>=0.4.0->xtuner) (0.4.3)\n",
      "Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch->xtuner) (1.12.1)\n",
      "Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch->xtuner) (3.3)\n",
      "Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch->xtuner) (3.1.4)\n",
      "Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch->xtuner)\n",
      "  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)\n",
      "Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch->xtuner)\n",
      "  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)\n",
      "Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch->xtuner)\n",
      "  Using cached nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)\n",
      "Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch->xtuner)\n",
      "  Using cached nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (731.7 MB)\n",
      "Collecting nvidia-cublas-cu12==12.1.3.1 (from torch->xtuner)\n",
      "  Using cached nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)\n",
      "Collecting nvidia-cufft-cu12==11.0.2.54 (from torch->xtuner)\n",
      "  Using cached nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB)\n",
      "Collecting nvidia-curand-cu12==10.3.2.106 (from torch->xtuner)\n",
      "  Using cached nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB)\n",
      "Collecting nvidia-cusolver-cu12==11.4.5.107 (from torch->xtuner)\n",
      "  Using cached nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB)\n",
      "Collecting nvidia-cusparse-cu12==12.1.0.106 (from torch->xtuner)\n",
      "  Using cached nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB)\n",
      "Collecting nvidia-nccl-cu12==2.20.5 (from torch->xtuner)\n",
      "  Using cached nvidia_nccl_cu12-2.20.5-py3-none-manylinux2014_x86_64.whl (176.2 MB)\n",
      "Collecting nvidia-nvtx-cu12==12.1.105 (from torch->xtuner)\n",
      "  Using cached nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB)\n",
      "Requirement already satisfied: triton==2.3.0 in /usr/local/lib/python3.10/dist-packages (from torch->xtuner) (2.3.0)\n",
      "Collecting nvidia-nvjitlink-cu12 (from nvidia-cusolver-cu12==11.4.5.107->torch->xtuner)\n",
      "  Downloading nvidia_nvjitlink_cu12-12.5.40-py3-none-manylinux2014_x86_64.whl (21.3 MB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m21.3/21.3 MB\u001b[0m \u001b[31m50.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/dist-packages (from transformers!=4.38.0,!=4.38.1,!=4.38.2,>=4.36.0->xtuner) (2024.5.15)\n",
      "Requirement already satisfied: tokenizers<0.20,>=0.19 in /usr/local/lib/python3.10/dist-packages (from transformers!=4.38.0,!=4.38.1,!=4.38.2,>=4.36.0->xtuner) (0.19.1)\n",
      "Requirement already satisfied: et-xmlfile in /usr/local/lib/python3.10/dist-packages (from openpyxl->xtuner) (1.1.0)\n",
      "Requirement already satisfied: pillow!=7.1.0,!=7.1.1,!=8.3.0,>=6.1.0 in /usr/local/lib/python3.10/dist-packages (from scikit-image->xtuner) (9.4.0)\n",
      "Requirement already satisfied: imageio>=2.4.1 in /usr/local/lib/python3.10/dist-packages (from scikit-image->xtuner) (2.31.6)\n",
      "Requirement already satisfied: tifffile>=2019.7.26 in /usr/local/lib/python3.10/dist-packages (from scikit-image->xtuner) (2024.5.22)\n",
      "Requirement already satisfied: PyWavelets>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from scikit-image->xtuner) (1.6.0)\n",
      "Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets>=2.16.0->xtuner) (1.3.1)\n",
      "Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets>=2.16.0->xtuner) (23.2.0)\n",
      "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets>=2.16.0->xtuner) (1.4.1)\n",
      "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets>=2.16.0->xtuner) (6.0.5)\n",
      "Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets>=2.16.0->xtuner) (1.9.4)\n",
      "Requirement already satisfied: async-timeout<5.0,>=4.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets>=2.16.0->xtuner) (4.0.3)\n",
      "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests>=2.19.0->datasets>=2.16.0->xtuner) (3.3.2)\n",
      "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests>=2.19.0->datasets>=2.16.0->xtuner) (3.7)\n",
      "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2.19.0->datasets>=2.16.0->xtuner) (2.0.7)\n",
      "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests>=2.19.0->datasets>=2.16.0->xtuner) (2024.2.2)\n",
      "Collecting feedparser==6.0.10 (from arxiv->lagent>=0.1.2->xtuner)\n",
      "  Downloading feedparser-6.0.10-py3-none-any.whl (81 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m81.1/81.1 kB\u001b[0m \u001b[31m13.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting sgmllib3k (from feedparser==6.0.10->arxiv->lagent>=0.1.2->xtuner)\n",
      "  Downloading sgmllib3k-1.0.0.tar.gz (5.8 kB)\n",
      "  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
      "Collecting colorama>=0.4 (from griffe->lagent>=0.1.2->xtuner)\n",
      "  Downloading colorama-0.4.6-py2.py3-none-any.whl (25 kB)\n",
      "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch->xtuner) (2.1.5)\n",
      "Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /usr/local/lib/python3.10/dist-packages (from jsonschema->lagent>=0.1.2->xtuner) (2023.12.1)\n",
      "Requirement already satisfied: referencing>=0.28.4 in /usr/local/lib/python3.10/dist-packages (from jsonschema->lagent>=0.1.2->xtuner) (0.35.1)\n",
      "Requirement already satisfied: rpds-py>=0.7.1 in /usr/local/lib/python3.10/dist-packages (from jsonschema->lagent>=0.1.2->xtuner) (0.18.1)\n",
      "Requirement already satisfied: notebook in /usr/local/lib/python3.10/dist-packages (from jupyter->lagent>=0.1.2->xtuner) (6.5.5)\n",
      "Collecting qtconsole (from jupyter->lagent>=0.1.2->xtuner)\n",
      "  Downloading qtconsole-5.5.2-py3-none-any.whl (123 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m123.4/123.4 kB\u001b[0m \u001b[31m18.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: jupyter-console in /usr/local/lib/python3.10/dist-packages (from jupyter->lagent>=0.1.2->xtuner) (6.1.0)\n",
      "Requirement already satisfied: nbconvert in /usr/local/lib/python3.10/dist-packages (from jupyter->lagent>=0.1.2->xtuner) (6.5.4)\n",
      "Requirement already satisfied: ipykernel in /usr/local/lib/python3.10/dist-packages (from jupyter->lagent>=0.1.2->xtuner) (5.5.6)\n",
      "Requirement already satisfied: ipywidgets in /usr/local/lib/python3.10/dist-packages (from jupyter->lagent>=0.1.2->xtuner) (7.7.1)\n",
      "Requirement already satisfied: traitlets in /usr/local/lib/python3.10/dist-packages (from jupyter-client->lagent>=0.1.2->xtuner) (5.7.1)\n",
      "Requirement already satisfied: jupyter-core>=4.6.0 in /usr/local/lib/python3.10/dist-packages (from jupyter-client->lagent>=0.1.2->xtuner) (5.7.2)\n",
      "Requirement already satisfied: pyzmq>=13 in /usr/local/lib/python3.10/dist-packages (from jupyter-client->lagent>=0.1.2->xtuner) (24.0.1)\n",
      "Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.10/dist-packages (from jupyter-client->lagent>=0.1.2->xtuner) (2.8.2)\n",
      "Requirement already satisfied: tornado>=4.1 in /usr/local/lib/python3.10/dist-packages (from jupyter-client->lagent>=0.1.2->xtuner) (6.3.3)\n",
      "Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->mmengine>=0.10.3->xtuner) (1.2.1)\n",
      "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib->mmengine>=0.10.3->xtuner) (0.12.1)\n",
      "Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->mmengine>=0.10.3->xtuner) (4.52.4)\n",
      "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->mmengine>=0.10.3->xtuner) (1.4.5)\n",
      "Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->mmengine>=0.10.3->xtuner) (3.1.2)\n",
      "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas->datasets>=2.16.0->xtuner) (2023.4)\n",
      "Requirement already satisfied: tzdata>=2022.1 in /usr/local/lib/python3.10/dist-packages (from pandas->datasets>=2.16.0->xtuner) (2024.1)\n",
      "Requirement already satisfied: markdown-it-py>=2.2.0 in /usr/local/lib/python3.10/dist-packages (from rich->mmengine>=0.10.3->xtuner) (3.0.0)\n",
      "Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /usr/local/lib/python3.10/dist-packages (from rich->mmengine>=0.10.3->xtuner) (2.16.1)\n",
      "Requirement already satisfied: altair<6,>=4.0 in /usr/local/lib/python3.10/dist-packages (from streamlit->lagent>=0.1.2->xtuner) (4.2.2)\n",
      "Requirement already satisfied: blinker<2,>=1.0.0 in /usr/lib/python3/dist-packages (from streamlit->lagent>=0.1.2->xtuner) (1.4)\n",
      "Requirement already satisfied: cachetools<6,>=4.0 in /usr/local/lib/python3.10/dist-packages (from streamlit->lagent>=0.1.2->xtuner) (5.3.3)\n",
      "Requirement already satisfied: click<9,>=7.0 in /usr/local/lib/python3.10/dist-packages (from streamlit->lagent>=0.1.2->xtuner) (8.1.7)\n",
      "Requirement already satisfied: protobuf<5,>=3.20 in /usr/local/lib/python3.10/dist-packages (from streamlit->lagent>=0.1.2->xtuner) (3.20.3)\n",
      "Requirement already satisfied: tenacity<9,>=8.1.0 in /usr/local/lib/python3.10/dist-packages (from streamlit->lagent>=0.1.2->xtuner) (8.3.0)\n",
      "Requirement already satisfied: toml<2,>=0.10.1 in /usr/local/lib/python3.10/dist-packages (from streamlit->lagent>=0.1.2->xtuner) (0.10.2)\n",
      "Collecting gitpython!=3.1.19,<4,>=3.0.7 (from streamlit->lagent>=0.1.2->xtuner)\n",
      "  Downloading GitPython-3.1.43-py3-none-any.whl (207 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m207.3/207.3 kB\u001b[0m \u001b[31m30.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting pydeck<1,>=0.8.0b4 (from streamlit->lagent>=0.1.2->xtuner)\n",
      "  Downloading pydeck-0.9.1-py2.py3-none-any.whl (6.9 MB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m6.9/6.9 MB\u001b[0m \u001b[31m87.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting watchdog>=2.1.5 (from streamlit->lagent>=0.1.2->xtuner)\n",
      "  Downloading watchdog-4.0.1-py3-none-manylinux2014_x86_64.whl (83 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m83.0/83.0 kB\u001b[0m \u001b[31m14.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: mpmath<1.4.0,>=1.1.0 in /usr/local/lib/python3.10/dist-packages (from sympy->torch->xtuner) (1.3.0)\n",
      "Requirement already satisfied: importlib-metadata>=6.6.0 in /usr/local/lib/python3.10/dist-packages (from yapf->mmengine>=0.10.3->xtuner) (7.1.0)\n",
      "Requirement already satisfied: platformdirs>=3.5.1 in /usr/local/lib/python3.10/dist-packages (from yapf->mmengine>=0.10.3->xtuner) (4.2.2)\n",
      "Requirement already satisfied: tomli>=2.0.1 in /usr/local/lib/python3.10/dist-packages (from yapf->mmengine>=0.10.3->xtuner) (2.0.1)\n",
      "Requirement already satisfied: entrypoints in /usr/local/lib/python3.10/dist-packages (from altair<6,>=4.0->streamlit->lagent>=0.1.2->xtuner) (0.4)\n",
      "Requirement already satisfied: toolz in /usr/local/lib/python3.10/dist-packages (from altair<6,>=4.0->streamlit->lagent>=0.1.2->xtuner) (0.12.1)\n",
      "Collecting gitdb<5,>=4.0.1 (from gitpython!=3.1.19,<4,>=3.0.7->streamlit->lagent>=0.1.2->xtuner)\n",
      "  Downloading gitdb-4.0.11-py3-none-any.whl (62 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m62.7/62.7 kB\u001b[0m \u001b[31m11.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.10/dist-packages (from importlib-metadata>=6.6.0->yapf->mmengine>=0.10.3->xtuner) (3.19.0)\n",
      "Requirement already satisfied: mdurl~=0.1 in /usr/local/lib/python3.10/dist-packages (from markdown-it-py>=2.2.0->rich->mmengine>=0.10.3->xtuner) (0.1.2)\n",
      "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.1->jupyter-client->lagent>=0.1.2->xtuner) (1.16.0)\n",
      "Requirement already satisfied: ipython-genutils in /usr/local/lib/python3.10/dist-packages (from ipykernel->jupyter->lagent>=0.1.2->xtuner) (0.2.0)\n",
      "Requirement already satisfied: ipython>=5.0.0 in /usr/local/lib/python3.10/dist-packages (from ipykernel->jupyter->lagent>=0.1.2->xtuner) (7.34.0)\n",
      "Requirement already satisfied: widgetsnbextension~=3.6.0 in /usr/local/lib/python3.10/dist-packages (from ipywidgets->jupyter->lagent>=0.1.2->xtuner) (3.6.6)\n",
      "Requirement already satisfied: jupyterlab-widgets>=1.0.0 in /usr/local/lib/python3.10/dist-packages (from ipywidgets->jupyter->lagent>=0.1.2->xtuner) (3.0.11)\n",
      "Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from jupyter-console->jupyter->lagent>=0.1.2->xtuner) (3.0.45)\n",
      "Requirement already satisfied: lxml in /usr/local/lib/python3.10/dist-packages (from nbconvert->jupyter->lagent>=0.1.2->xtuner) (4.9.4)\n",
      "Requirement already satisfied: beautifulsoup4 in /usr/local/lib/python3.10/dist-packages (from nbconvert->jupyter->lagent>=0.1.2->xtuner) (4.12.3)\n",
      "Requirement already satisfied: bleach in /usr/local/lib/python3.10/dist-packages (from nbconvert->jupyter->lagent>=0.1.2->xtuner) (6.1.0)\n",
      "Requirement already satisfied: defusedxml in /usr/local/lib/python3.10/dist-packages (from nbconvert->jupyter->lagent>=0.1.2->xtuner) (0.7.1)\n",
      "Requirement already satisfied: jupyterlab-pygments in /usr/local/lib/python3.10/dist-packages (from nbconvert->jupyter->lagent>=0.1.2->xtuner) (0.3.0)\n",
      "Requirement already satisfied: mistune<2,>=0.8.1 in /usr/local/lib/python3.10/dist-packages (from nbconvert->jupyter->lagent>=0.1.2->xtuner) (0.8.4)\n",
      "Requirement already satisfied: nbclient>=0.5.0 in /usr/local/lib/python3.10/dist-packages (from nbconvert->jupyter->lagent>=0.1.2->xtuner) (0.10.0)\n",
      "Requirement already satisfied: nbformat>=5.1 in /usr/local/lib/python3.10/dist-packages (from nbconvert->jupyter->lagent>=0.1.2->xtuner) (5.10.4)\n",
      "Requirement already satisfied: pandocfilters>=1.4.1 in /usr/local/lib/python3.10/dist-packages (from nbconvert->jupyter->lagent>=0.1.2->xtuner) (1.5.1)\n",
      "Requirement already satisfied: tinycss2 in /usr/local/lib/python3.10/dist-packages (from nbconvert->jupyter->lagent>=0.1.2->xtuner) (1.3.0)\n",
      "Requirement already satisfied: argon2-cffi in /usr/local/lib/python3.10/dist-packages (from notebook->jupyter->lagent>=0.1.2->xtuner) (23.1.0)\n",
      "Requirement already satisfied: nest-asyncio>=1.5 in /usr/local/lib/python3.10/dist-packages (from notebook->jupyter->lagent>=0.1.2->xtuner) (1.6.0)\n",
      "Requirement already satisfied: Send2Trash>=1.8.0 in /usr/local/lib/python3.10/dist-packages (from notebook->jupyter->lagent>=0.1.2->xtuner) (1.8.3)\n",
      "Requirement already satisfied: terminado>=0.8.3 in /usr/local/lib/python3.10/dist-packages (from notebook->jupyter->lagent>=0.1.2->xtuner) (0.18.1)\n",
      "Requirement already satisfied: prometheus-client in /usr/local/lib/python3.10/dist-packages (from notebook->jupyter->lagent>=0.1.2->xtuner) (0.20.0)\n",
      "Requirement already satisfied: nbclassic>=0.4.7 in /usr/local/lib/python3.10/dist-packages (from notebook->jupyter->lagent>=0.1.2->xtuner) (1.1.0)\n",
      "Collecting qtpy>=2.4.0 (from qtconsole->jupyter->lagent>=0.1.2->xtuner)\n",
      "  Downloading QtPy-2.4.1-py3-none-any.whl (93 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m93.5/93.5 kB\u001b[0m \u001b[31m16.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting smmap<6,>=3.0.1 (from gitdb<5,>=4.0.1->gitpython!=3.1.19,<4,>=3.0.7->streamlit->lagent>=0.1.2->xtuner)\n",
      "  Downloading smmap-5.0.1-py3-none-any.whl (24 kB)\n",
      "Requirement already satisfied: setuptools>=18.5 in /usr/local/lib/python3.10/dist-packages (from ipython>=5.0.0->ipykernel->jupyter->lagent>=0.1.2->xtuner) (67.7.2)\n",
      "Collecting jedi>=0.16 (from ipython>=5.0.0->ipykernel->jupyter->lagent>=0.1.2->xtuner)\n",
      "  Downloading jedi-0.19.1-py2.py3-none-any.whl (1.6 MB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.6/1.6 MB\u001b[0m \u001b[31m88.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: decorator in /usr/local/lib/python3.10/dist-packages (from ipython>=5.0.0->ipykernel->jupyter->lagent>=0.1.2->xtuner) (4.4.2)\n",
      "Requirement already satisfied: pickleshare in /usr/local/lib/python3.10/dist-packages (from ipython>=5.0.0->ipykernel->jupyter->lagent>=0.1.2->xtuner) (0.7.5)\n",
      "Requirement already satisfied: backcall in /usr/local/lib/python3.10/dist-packages (from ipython>=5.0.0->ipykernel->jupyter->lagent>=0.1.2->xtuner) (0.2.0)\n",
      "Requirement already satisfied: matplotlib-inline in /usr/local/lib/python3.10/dist-packages (from ipython>=5.0.0->ipykernel->jupyter->lagent>=0.1.2->xtuner) (0.1.7)\n",
      "Requirement already satisfied: pexpect>4.3 in /usr/local/lib/python3.10/dist-packages (from ipython>=5.0.0->ipykernel->jupyter->lagent>=0.1.2->xtuner) (4.9.0)\n",
      "Requirement already satisfied: notebook-shim>=0.2.3 in /usr/local/lib/python3.10/dist-packages (from nbclassic>=0.4.7->notebook->jupyter->lagent>=0.1.2->xtuner) (0.2.4)\n",
      "Requirement already satisfied: fastjsonschema>=2.15 in /usr/local/lib/python3.10/dist-packages (from nbformat>=5.1->nbconvert->jupyter->lagent>=0.1.2->xtuner) (2.19.1)\n",
      "Requirement already satisfied: wcwidth in /usr/local/lib/python3.10/dist-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->jupyter-console->jupyter->lagent>=0.1.2->xtuner) (0.2.13)\n",
      "Requirement already satisfied: ptyprocess in /usr/local/lib/python3.10/dist-packages (from terminado>=0.8.3->notebook->jupyter->lagent>=0.1.2->xtuner) (0.7.0)\n",
      "Requirement already satisfied: argon2-cffi-bindings in /usr/local/lib/python3.10/dist-packages (from argon2-cffi->notebook->jupyter->lagent>=0.1.2->xtuner) (21.2.0)\n",
      "Requirement already satisfied: soupsieve>1.2 in /usr/local/lib/python3.10/dist-packages (from beautifulsoup4->nbconvert->jupyter->lagent>=0.1.2->xtuner) (2.5)\n",
      "Requirement already satisfied: webencodings in /usr/local/lib/python3.10/dist-packages (from bleach->nbconvert->jupyter->lagent>=0.1.2->xtuner) (0.5.1)\n",
      "Requirement already satisfied: parso<0.9.0,>=0.8.3 in /usr/local/lib/python3.10/dist-packages (from jedi>=0.16->ipython>=5.0.0->ipykernel->jupyter->lagent>=0.1.2->xtuner) (0.8.4)\n",
      "Requirement already satisfied: jupyter-server<3,>=1.8 in /usr/local/lib/python3.10/dist-packages (from notebook-shim>=0.2.3->nbclassic>=0.4.7->notebook->jupyter->lagent>=0.1.2->xtuner) (1.24.0)\n",
      "Requirement already satisfied: cffi>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from argon2-cffi-bindings->argon2-cffi->notebook->jupyter->lagent>=0.1.2->xtuner) (1.16.0)\n",
      "Requirement already satisfied: pycparser in /usr/local/lib/python3.10/dist-packages (from cffi>=1.0.1->argon2-cffi-bindings->argon2-cffi->notebook->jupyter->lagent>=0.1.2->xtuner) (2.22)\n",
      "Requirement already satisfied: anyio<4,>=3.1.0 in /usr/local/lib/python3.10/dist-packages (from jupyter-server<3,>=1.8->notebook-shim>=0.2.3->nbclassic>=0.4.7->notebook->jupyter->lagent>=0.1.2->xtuner) (3.7.1)\n",
      "Requirement already satisfied: websocket-client in /usr/local/lib/python3.10/dist-packages (from jupyter-server<3,>=1.8->notebook-shim>=0.2.3->nbclassic>=0.4.7->notebook->jupyter->lagent>=0.1.2->xtuner) (1.8.0)\n",
      "Requirement already satisfied: sniffio>=1.1 in /usr/local/lib/python3.10/dist-packages (from anyio<4,>=3.1.0->jupyter-server<3,>=1.8->notebook-shim>=0.2.3->nbclassic>=0.4.7->notebook->jupyter->lagent>=0.1.2->xtuner) (1.3.1)\n",
      "Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio<4,>=3.1.0->jupyter-server<3,>=1.8->notebook-shim>=0.2.3->nbclassic>=0.4.7->notebook->jupyter->lagent>=0.1.2->xtuner) (1.2.1)\n",
      "Building wheels for collected packages: transformers-stream-generator, func-timeout, sgmllib3k\n",
      "  Building wheel for transformers-stream-generator (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
      "  Created wheel for transformers-stream-generator: filename=transformers_stream_generator-0.0.5-py3-none-any.whl size=12425 sha256=b3eed23d845cad4cb2cdb68fb5c8f91cdbda35dfcf7f7361d798c7ab9de60f31\n",
      "  Stored in directory: /root/.cache/pip/wheels/95/4a/90/140f7b67d125906f6a165f38aad212ecb4a695ad0d87582437\n",
      "  Building wheel for func-timeout (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
      "  Created wheel for func-timeout: filename=func_timeout-4.3.5-py3-none-any.whl size=15080 sha256=2f04978bf6d08bc7edf8379e944da4b13a901853a56162384b01921491df9059\n",
      "  Stored in directory: /root/.cache/pip/wheels/3f/83/19/b5552bb9630e353f7c5b15be44bf10900afe1abbbfcf536afd\n",
      "  Building wheel for sgmllib3k (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
      "  Created wheel for sgmllib3k: filename=sgmllib3k-1.0.0-py3-none-any.whl size=6049 sha256=1cefce9c5033d07be8252f3085c34b6b2502f1688143537e2d78bdb76f539881\n",
      "  Stored in directory: /root/.cache/pip/wheels/f0/69/93/a47e9d621be168e9e33c7ce60524393c0b92ae83cf6c6e89c5\n",
      "Successfully built transformers-stream-generator func-timeout sgmllib3k\n",
      "Installing collected packages: sgmllib3k, func-timeout, addict, xxhash, watchdog, smmap, qtpy, phx-class-registry, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, json5, jedi, feedparser, einops, dill, colorama, yapf, tiktoken, pydeck, nvidia-cusparse-cu12, nvidia-cudnn-cu12, multiprocess, griffe, gitdb, arxiv, nvidia-cusolver-cu12, mmengine, gitpython, qtconsole, datasets, transformers-stream-generator, streamlit, bitsandbytes, accelerate, peft, jupyter, lagent, xtuner\n",
      "Successfully installed accelerate-0.30.1 addict-2.4.0 arxiv-2.1.0 bitsandbytes-0.43.1 colorama-0.4.6 datasets-2.19.1 dill-0.3.8 einops-0.8.0 feedparser-6.0.10 func-timeout-4.3.5 gitdb-4.0.11 gitpython-3.1.43 griffe-0.45.2 jedi-0.19.1 json5-0.9.25 jupyter-1.0.0 lagent-0.2.2 mmengine-0.10.4 multiprocess-0.70.16 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-8.9.2.26 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.20.5 nvidia-nvjitlink-cu12-12.5.40 nvidia-nvtx-cu12-12.1.105 peft-0.11.1 phx-class-registry-4.1.0 pydeck-0.9.1 qtconsole-5.5.2 qtpy-2.4.1 sgmllib3k-1.0.0 smmap-5.0.1 streamlit-1.35.0 tiktoken-0.7.0 transformers-stream-generator-0.0.5 watchdog-4.0.1 xtuner-0.1.19 xxhash-3.4.1 yapf-0.40.2\n"
     ]
    }
   ]
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Finetune Gemma\n",
    "\n",
    "XTuner has many built-in configurations for finetuning various LLMs. Check out the relevant ones for Gemma. If you are curious how they look like or you want to make adjustments, take a look at the [files](https://github.com/InternLM/xtuner/tree/main/xtuner/configs/gemma)."
   ],
   "metadata": {
    "id": "Di9D2DY5dqmw"
   }
  },
  {
   "cell_type": "code",
   "source": [
    "!xtuner list-cfg | grep gemma"
   ],
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "gWIzVxhwcDSw",
    "outputId": "284b696c-c1a5-43b7-9748-9f8a90c2f3e3"
   },
   "execution_count": 4,
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "gemma_2b_full_alpaca_e3\n",
      "gemma_2b_it_full_alpaca_e3\n",
      "gemma_2b_it_qlora_alpaca_e3\n",
      "gemma_2b_qlora_alpaca_e3\n",
      "gemma_7b_full_alpaca_e3\n",
      "gemma_7b_it_full_alpaca_e3\n",
      "gemma_7b_it_qlora_alpaca_e3\n",
      "gemma_7b_qlora_alpaca_e3\n"
     ]
    }
   ]
  },
  {
   "cell_type": "markdown",
   "source": [
    "For demonstration, this notebook finetunes the instruction tuned Gemma 2B model using [QLoRA](https://arxiv.org/abs/2305.14314) and the [Alpaca dataset](https://huggingface.co/datasets/tatsu-lab/alpaca). You can optionally enable DeepSpeed as well."
   ],
   "metadata": {
    "id": "sXm65eC5eR4n"
   }
  },
  {
   "cell_type": "code",
   "source": [
    "!xtuner train gemma_2b_it_qlora_alpaca_e3"
   ],
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "2pGX3hLubhkJ",
    "outputId": "a8b202f6-bb1f-43ab-82b7-dc176a7131e8"
   },
   "execution_count": 11,
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "06/02 03:40:31 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - \n",
      "------------------------------------------------------------\n",
      "System environment:\n",
      "    sys.platform: linux\n",
      "    Python: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]\n",
      "    CUDA available: True\n",
      "    MUSA available: False\n",
      "    numpy_random_seed: 204097869\n",
      "    GPU 0: NVIDIA A100-SXM4-40GB\n",
      "    CUDA_HOME: /usr/local/cuda\n",
      "    NVCC: Cuda compilation tools, release 12.2, V12.2.140\n",
      "    GCC: x86_64-linux-gnu-gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0\n",
      "    PyTorch: 2.3.0+cu121\n",
      "    PyTorch compiling details: PyTorch built with:\n",
      "  - GCC 9.3\n",
      "  - C++ Version: 201703\n",
      "  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications\n",
      "  - Intel(R) MKL-DNN v3.3.6 (Git Hash 86e6af5974177e513fd3fee58425e1063e7f1361)\n",
      "  - OpenMP 201511 (a.k.a. OpenMP 4.5)\n",
      "  - LAPACK is enabled (usually provided by MKL)\n",
      "  - NNPACK is enabled\n",
      "  - CPU capability usage: AVX512\n",
      "  - CUDA Runtime 12.1\n",
      "  - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90\n",
      "  - CuDNN 8.9.2\n",
      "  - Magma 2.6.1\n",
      "  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=8.9.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.3.0, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=1, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, \n",
      "\n",
      "    TorchVision: 0.18.0+cu121\n",
      "    OpenCV: 4.8.0\n",
      "    MMEngine: 0.10.4\n",
      "\n",
      "Runtime environment:\n",
      "    cudnn_benchmark: False\n",
      "    mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0}\n",
      "    dist_cfg: {'backend': 'nccl'}\n",
      "    seed: 204097869\n",
      "    deterministic: False\n",
      "    Distributed launcher: none\n",
      "    Distributed training: False\n",
      "    GPU number: 1\n",
      "------------------------------------------------------------\n",
      "\n",
      "06/02 03:40:31 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Config:\n",
      "SYSTEM = 'xtuner.utils.SYSTEM_TEMPLATE.alpaca'\n",
      "accumulative_counts = 16\n",
      "alpaca_en = dict(\n",
      "    dataset=dict(path='tatsu-lab/alpaca', type='datasets.load_dataset'),\n",
      "    dataset_map_fn='xtuner.dataset.map_fns.alpaca_map_fn',\n",
      "    max_length=2048,\n",
      "    pack_to_max_length=True,\n",
      "    remove_unused_columns=True,\n",
      "    shuffle_before_pack=True,\n",
      "    template_map_fn=dict(\n",
      "        template='xtuner.utils.PROMPT_TEMPLATE.gemma',\n",
      "        type='xtuner.dataset.map_fns.template_map_fn_factory'),\n",
      "    tokenizer=dict(\n",
      "        padding_side='right',\n",
      "        pretrained_model_name_or_path='google/gemma-2b-it',\n",
      "        trust_remote_code=True,\n",
      "        type='transformers.AutoTokenizer.from_pretrained'),\n",
      "    type='xtuner.dataset.process_hf_dataset',\n",
      "    use_varlen_attn=False)\n",
      "alpaca_en_path = 'tatsu-lab/alpaca'\n",
      "batch_size = 1\n",
      "betas = (\n",
      "    0.9,\n",
      "    0.999,\n",
      ")\n",
      "custom_hooks = [\n",
      "    dict(\n",
      "        tokenizer=dict(\n",
      "            padding_side='right',\n",
      "            pretrained_model_name_or_path='google/gemma-2b-it',\n",
      "            trust_remote_code=True,\n",
      "            type='transformers.AutoTokenizer.from_pretrained'),\n",
      "        type='xtuner.engine.hooks.DatasetInfoHook'),\n",
      "    dict(\n",
      "        evaluation_inputs=[\n",
      "            '请给我介绍五个上海的景点',\n",
      "            'Please tell me five scenic spots in Shanghai',\n",
      "        ],\n",
      "        every_n_iters=500,\n",
      "        prompt_template='xtuner.utils.PROMPT_TEMPLATE.gemma',\n",
      "        system='xtuner.utils.SYSTEM_TEMPLATE.alpaca',\n",
      "        tokenizer=dict(\n",
      "            padding_side='right',\n",
      "            pretrained_model_name_or_path='google/gemma-2b-it',\n",
      "            trust_remote_code=True,\n",
      "            type='transformers.AutoTokenizer.from_pretrained'),\n",
      "        type='xtuner.engine.hooks.EvaluateChatHook'),\n",
      "]\n",
      "dataloader_num_workers = 0\n",
      "default_hooks = dict(\n",
      "    checkpoint=dict(\n",
      "        by_epoch=False,\n",
      "        interval=500,\n",
      "        max_keep_ckpts=2,\n",
      "        type='mmengine.hooks.CheckpointHook'),\n",
      "    logger=dict(\n",
      "        interval=10,\n",
      "        log_metric_by_epoch=False,\n",
      "        type='mmengine.hooks.LoggerHook'),\n",
      "    param_scheduler=dict(type='mmengine.hooks.ParamSchedulerHook'),\n",
      "    sampler_seed=dict(type='mmengine.hooks.DistSamplerSeedHook'),\n",
      "    timer=dict(type='mmengine.hooks.IterTimerHook'))\n",
      "env_cfg = dict(\n",
      "    cudnn_benchmark=False,\n",
      "    dist_cfg=dict(backend='nccl'),\n",
      "    mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0))\n",
      "evaluation_freq = 500\n",
      "evaluation_inputs = [\n",
      "    '请给我介绍五个上海的景点',\n",
      "    'Please tell me five scenic spots in Shanghai',\n",
      "]\n",
      "launcher = 'none'\n",
      "load_from = None\n",
      "log_level = 'INFO'\n",
      "log_processor = dict(by_epoch=False)\n",
      "lr = 0.0002\n",
      "max_epochs = 3\n",
      "max_length = 2048\n",
      "max_norm = 1\n",
      "model = dict(\n",
      "    llm=dict(\n",
      "        pretrained_model_name_or_path='google/gemma-2b-it',\n",
      "        quantization_config=dict(\n",
      "            bnb_4bit_compute_dtype='torch.float16',\n",
      "            bnb_4bit_quant_type='nf4',\n",
      "            bnb_4bit_use_double_quant=True,\n",
      "            llm_int8_has_fp16_weight=False,\n",
      "            llm_int8_threshold=6.0,\n",
      "            load_in_4bit=True,\n",
      "            load_in_8bit=False,\n",
      "            type='transformers.BitsAndBytesConfig'),\n",
      "        torch_dtype='torch.float16',\n",
      "        trust_remote_code=True,\n",
      "        type='transformers.AutoModelForCausalLM.from_pretrained'),\n",
      "    lora=dict(\n",
      "        bias='none',\n",
      "        lora_alpha=16,\n",
      "        lora_dropout=0.1,\n",
      "        r=64,\n",
      "        task_type='CAUSAL_LM',\n",
      "        type='peft.LoraConfig'),\n",
      "    type='xtuner.model.SupervisedFinetune',\n",
      "    use_varlen_attn=False)\n",
      "optim_type = 'torch.optim.AdamW'\n",
      "optim_wrapper = dict(\n",
      "    accumulative_counts=16,\n",
      "    clip_grad=dict(error_if_nonfinite=False, max_norm=1),\n",
      "    dtype='float16',\n",
      "    loss_scale='dynamic',\n",
      "    optimizer=dict(\n",
      "        betas=(\n",
      "            0.9,\n",
      "            0.999,\n",
      "        ),\n",
      "        lr=0.0002,\n",
      "        type='torch.optim.AdamW',\n",
      "        weight_decay=0),\n",
      "    type='mmengine.optim.AmpOptimWrapper')\n",
      "pack_to_max_length = True\n",
      "param_scheduler = [\n",
      "    dict(\n",
      "        begin=0,\n",
      "        by_epoch=True,\n",
      "        convert_to_iter_based=True,\n",
      "        end=0.09,\n",
      "        start_factor=1e-05,\n",
      "        type='mmengine.optim.LinearLR'),\n",
      "    dict(\n",
      "        begin=0.09,\n",
      "        by_epoch=True,\n",
      "        convert_to_iter_based=True,\n",
      "        end=3,\n",
      "        eta_min=0.0,\n",
      "        type='mmengine.optim.CosineAnnealingLR'),\n",
      "]\n",
      "pretrained_model_name_or_path = 'google/gemma-2b-it'\n",
      "prompt_template = 'xtuner.utils.PROMPT_TEMPLATE.gemma'\n",
      "randomness = dict(deterministic=False, seed=None)\n",
      "resume = False\n",
      "save_steps = 500\n",
      "save_total_limit = 2\n",
      "tokenizer = dict(\n",
      "    padding_side='right',\n",
      "    pretrained_model_name_or_path='google/gemma-2b-it',\n",
      "    trust_remote_code=True,\n",
      "    type='transformers.AutoTokenizer.from_pretrained')\n",
      "train_cfg = dict(max_epochs=3, type='xtuner.engine.runner.TrainLoop')\n",
      "train_dataloader = dict(\n",
      "    batch_size=1,\n",
      "    collate_fn=dict(\n",
      "        type='xtuner.dataset.collate_fns.default_collate_fn',\n",
      "        use_varlen_attn=False),\n",
      "    dataset=dict(\n",
      "        dataset=dict(path='tatsu-lab/alpaca', type='datasets.load_dataset'),\n",
      "        dataset_map_fn='xtuner.dataset.map_fns.alpaca_map_fn',\n",
      "        max_length=2048,\n",
      "        pack_to_max_length=True,\n",
      "        remove_unused_columns=True,\n",
      "        shuffle_before_pack=True,\n",
      "        template_map_fn=dict(\n",
      "            template='xtuner.utils.PROMPT_TEMPLATE.gemma',\n",
      "            type='xtuner.dataset.map_fns.template_map_fn_factory'),\n",
      "        tokenizer=dict(\n",
      "            padding_side='right',\n",
      "            pretrained_model_name_or_path='google/gemma-2b-it',\n",
      "            trust_remote_code=True,\n",
      "            type='transformers.AutoTokenizer.from_pretrained'),\n",
      "        type='xtuner.dataset.process_hf_dataset',\n",
      "        use_varlen_attn=False),\n",
      "    num_workers=0,\n",
      "    sampler=dict(shuffle=True, type='mmengine.dataset.DefaultSampler'))\n",
      "use_varlen_attn = False\n",
      "visualizer = None\n",
      "warmup_ratio = 0.03\n",
      "weight_decay = 0\n",
      "work_dir = './work_dirs/gemma_2b_it_qlora_alpaca_e3'\n",
      "\n",
      "quantization_config convert to <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>\n",
      "06/02 03:40:31 - mmengine - \u001b[5m\u001b[4m\u001b[33mWARNING\u001b[0m - Failed to search registry with scope \"mmengine\" in the \"builder\" registry tree. As a workaround, the current \"builder\" registry in \"xtuner\" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether \"mmengine\" is a correct scope, or whether the registry is initialized.\n",
      "`low_cpu_mem_usage` was None, now set to True since model is quantized.\n",
      "`config.hidden_act` is ignored, you should use `config.hidden_activation` instead.\n",
      "Gemma's activation function will be set to `gelu_pytorch_tanh`. Please, use\n",
      "`config.hidden_activation` if you want to override this behaviour.\n",
      "See https://github.com/huggingface/transformers/pull/29402 for more details.\n",
      "Loading checkpoint shards: 100% 2/2 [00:02<00:00,  1.49s/it]\n",
      "06/02 03:40:36 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Distributed training is not used, all SyncBatchNorm (SyncBN) layers in the model will be automatically reverted to BatchNormXd layers if they are used.\n",
      "06/02 03:40:38 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Hooks will be executed in the following order:\n",
      "before_run:\n",
      "(VERY_HIGH   ) RuntimeInfoHook                    \n",
      "(BELOW_NORMAL) LoggerHook                         \n",
      " -------------------- \n",
      "before_train:\n",
      "(VERY_HIGH   ) RuntimeInfoHook                    \n",
      "(NORMAL      ) IterTimerHook                      \n",
      "(NORMAL      ) DatasetInfoHook                    \n",
      "(LOW         ) EvaluateChatHook                   \n",
      "(VERY_LOW    ) CheckpointHook                     \n",
      " -------------------- \n",
      "before_train_epoch:\n",
      "(VERY_HIGH   ) RuntimeInfoHook                    \n",
      "(NORMAL      ) IterTimerHook                      \n",
      "(NORMAL      ) DistSamplerSeedHook                \n",
      " -------------------- \n",
      "before_train_iter:\n",
      "(VERY_HIGH   ) RuntimeInfoHook                    \n",
      "(NORMAL      ) IterTimerHook                      \n",
      " -------------------- \n",
      "after_train_iter:\n",
      "(VERY_HIGH   ) RuntimeInfoHook                    \n",
      "(NORMAL      ) IterTimerHook                      \n",
      "(BELOW_NORMAL) LoggerHook                         \n",
      "(LOW         ) ParamSchedulerHook                 \n",
      "(LOW         ) EvaluateChatHook                   \n",
      "(VERY_LOW    ) CheckpointHook                     \n",
      " -------------------- \n",
      "after_train_epoch:\n",
      "(NORMAL      ) IterTimerHook                      \n",
      "(LOW         ) ParamSchedulerHook                 \n",
      "(VERY_LOW    ) CheckpointHook                     \n",
      " -------------------- \n",
      "before_val:\n",
      "(VERY_HIGH   ) RuntimeInfoHook                    \n",
      "(NORMAL      ) DatasetInfoHook                    \n",
      " -------------------- \n",
      "before_val_epoch:\n",
      "(NORMAL      ) IterTimerHook                      \n",
      " -------------------- \n",
      "before_val_iter:\n",
      "(NORMAL      ) IterTimerHook                      \n",
      " -------------------- \n",
      "after_val_iter:\n",
      "(NORMAL      ) IterTimerHook                      \n",
      "(BELOW_NORMAL) LoggerHook                         \n",
      " -------------------- \n",
      "after_val_epoch:\n",
      "(VERY_HIGH   ) RuntimeInfoHook                    \n",
      "(NORMAL      ) IterTimerHook                      \n",
      "(BELOW_NORMAL) LoggerHook                         \n",
      "(LOW         ) ParamSchedulerHook                 \n",
      "(VERY_LOW    ) CheckpointHook                     \n",
      " -------------------- \n",
      "after_val:\n",
      "(VERY_HIGH   ) RuntimeInfoHook                    \n",
      "(LOW         ) EvaluateChatHook                   \n",
      " -------------------- \n",
      "after_train:\n",
      "(VERY_HIGH   ) RuntimeInfoHook                    \n",
      "(LOW         ) EvaluateChatHook                   \n",
      "(VERY_LOW    ) CheckpointHook                     \n",
      " -------------------- \n",
      "before_test:\n",
      "(VERY_HIGH   ) RuntimeInfoHook                    \n",
      "(NORMAL      ) DatasetInfoHook                    \n",
      " -------------------- \n",
      "before_test_epoch:\n",
      "(NORMAL      ) IterTimerHook                      \n",
      " -------------------- \n",
      "before_test_iter:\n",
      "(NORMAL      ) IterTimerHook                      \n",
      " -------------------- \n",
      "after_test_iter:\n",
      "(NORMAL      ) IterTimerHook                      \n",
      "(BELOW_NORMAL) LoggerHook                         \n",
      " -------------------- \n",
      "after_test_epoch:\n",
      "(VERY_HIGH   ) RuntimeInfoHook                    \n",
      "(NORMAL      ) IterTimerHook                      \n",
      "(BELOW_NORMAL) LoggerHook                         \n",
      " -------------------- \n",
      "after_test:\n",
      "(VERY_HIGH   ) RuntimeInfoHook                    \n",
      " -------------------- \n",
      "after_run:\n",
      "(BELOW_NORMAL) LoggerHook                         \n",
      " -------------------- \n",
      "Flattening the indices (num_proc=32): 100% 51979/51979 [00:00<00:00, 110082.27 examples/s]\n",
      "Map (num_proc=32): 100% 51979/51979 [00:01<00:00, 46861.38 examples/s]\n",
      "Map (num_proc=32): 100% 2181/2181 [00:00<00:00, 3215.12 examples/s]\n",
      "06/02 03:40:45 - mmengine - \u001b[5m\u001b[4m\u001b[33mWARNING\u001b[0m - Dataset Dataset has no metainfo. ``dataset_meta`` in visualizer will be None.\n",
      "06/02 03:40:45 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Num train samples 2181\n",
      "06/02 03:40:45 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - train example:\n",
      "2024-06-02 03:40:45.502601: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.\n",
      "2024-06-02 03:40:45.553683: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
      "2024-06-02 03:40:45.553728: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
      "2024-06-02 03:40:45.555768: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n",
      "2024-06-02 03:40:45.563715: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n",
      "To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n",
      "2024-06-02 03:40:46.650869: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n",
      "06/02 03:40:47 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - <bos><start_of_turn>user\n",
      "Analyze what are the differences between two products and provide a conclusion on which one is better.\n",
      "Product 1: Automatically synchronizes data across all the users.\n",
      "Product 2: Allows users to share files, with automatically synced versions.<end_of_turn>\n",
      "<start_of_turn>model\n",
      "Product 1 is better, as it offers comprehensive automatic synchronization of data across all users which eliminates the need for manual synchronization and ensures data consistency. On the other hand, Product 2 allows users to share files but does not have the automatic synchronization capabilities of Product 1.<end_of_turn><eos>\n",
      "<bos><start_of_turn>user\n",
      "Give a health measurement to this person\n",
      "Tim Blake Nelson<end_of_turn>\n",
      "<start_of_turn>model\n",
      "Sorry, the task involves personal information. Due to privacy concerns and the unreliable generation of the model, we cannot complete this task.<end_of_turn><eos>\n",
      "<bos><start_of_turn>user\n",
      "Describe the behavior of a person who is on the autism spectrum\n",
      "<end_of_turn>\n",
      "<start_of_turn>model\n",
      "A person on the autism spectrum may display behaviors such as communicating in unconventional ways, having difficulty forming social relationships, having intense interests and focusing on certain topics to the exclusion of others, and having issues with sensory stimuli. They may have difficulty interpreting other people's facial expressions and body language, and engaging in play. They may also have a wide range of learning abilities, including high intelligence, and may respond well to structure and routine.<end_of_turn><eos>\n",
      "<bos><start_of_turn>user\n",
      "Name three methods for authentication.\n",
      "<none><end_of_turn>\n",
      "<start_of_turn>model\n",
      "Three methods for authentication are: passwords, two-factor authentication, and biometrics.<end_of_turn><eos>\n",
      "<bos><start_of_turn>user\n",
      "Write a story involving a computer programmer from the future.\n",
      "<end_of_turn>\n",
      "<start_of_turn>model\n",
      "Madelyne was a computer programmer from the future. Her job was to create and improve upon existing computer systems designed to make life in the future easier and more efficient. She had a knack for problem solving and creativity which helped her to create some of the most innovative computer systems ever used. \n",
      "\n",
      "One day, she was tasked with the challenge of creating a program that could predict the stock market. The task seemed daunting and was even thought of as impossible by some, but Madelyne was determined to find a way to make it work. After spending months working on the program, she managed to develop a predictive algorithm that proved to be remarkably accurate. This algorithm instantly made Madelyne a tech celebrity and created a sensation in the tech industry.<end_of_turn><eos>\n",
      "<bos><start_of_turn>user\n",
      "Calculate the median of the following data set.\n",
      "1, 2, 4, 5, 8, 9<end_of_turn>\n",
      "<start_of_turn>model\n",
      "5<end_of_turn><eos>\n",
      "<bos><start_of_turn>user\n",
      "In not more than two sentences, summarize the purpose of Machine Learning.\n",
      "<end_of_turn>\n",
      "<start_of_turn>model\n",
      "The purpose of Machine Learning is to develop algorithms and systems that can learn from data and use that knowledge to make predictions, decisions, and recommendations.<end_of_turn><eos>\n",
      "<bos><start_of_turn>user\n",
      "Create a web page for a given topic.\n",
      "Topic: Outdoor Adventure<end_of_turn>\n",
      "<start_of_turn>model\n",
      "A web page for the topic of Outdoor Adventure, featuring a full-width picture at the top with engaging content below. The page should have an introduction describing the topic, listing out different adventures that can be included under the category, including detailed information on each adventure. It should also feature visitor reviews or ratings where applicable. There should be multiple tabs available, such as a FAQ page, to explore further. Additionally, there should be a booking or contact page with detailed information on how to get started and connect with an organizer or adventure professional.<end_of_turn><eos>\n",
      "<bos><start_of_turn>user\n",
      "Write an annotation for a website about animal conservation.\n",
      "<end_of_turn>\n",
      "<start_of_turn>model\n",
      "This website provides information on animal conservation, focusing on strategies to protect endangered species and understanding the current state of wildlife populations. It provides resources such as articles, reports, videos, and galleries to explore different topics related to conservation. It is meant to educate people about the importance of saving endangered species and inspiring action to protect them.<end_of_turn><eos>\n",
      "<bos><start_of_turn>user\n",
      "Write the most accurate summary of the passage in less than 100 words.\n",
      "With the introduction of machine learning and deep learning into the modern world, engineers have developed systems that are capable of extracting important information from large data sets without any human input. These systems, referred to as “intelligent agents”, are able to learn from the data, find patterns, and make informed decisions. Intelligent agents are being used in a variety of domains, such as healthcare, finance, manufacturing, and transportation.<end_of_turn>\n",
      "<start_of_turn>model\n",
      "Intelligent agents are systems that use machine learning and deep learning to analyze data and make informed decisions without the need of human interaction. These agents have found success in a variety of domains such as healthcare, finance, manufacturing, and transportation. They learn from the data, find patterns, and use this information to make decisions.<end_of_turn><eos>\n",
      "<bos><start_of_turn>user\n",
      "Calculate the sum of 1, 4, and 9.\n",
      "<end_of_turn>\n",
      "<start_of_turn>model\n",
      "The sum of 1, 4, and 9 is 14.<end_of_turn><eos>\n",
      "<bos><start_of_turn>user\n",
      "Translate the phrase \"Life is a journey\" into a foreign language of your choice.\n",
      "<end_of_turn>\n",
      "<start_of_turn>model\n",
      "La vida es un viaje (Spanish).<end_of_turn><eos>\n",
      "<bos><start_of_turn>user\n",
      "Convert the following code snippet into a pseudocode.\n",
      "while (counter < limit):\n",
      "    print (counter)\n",
      "    counter += 5<end_of_turn>\n",
      "<start_of_turn>model\n",
      "WHILE counter is less than limit DO\n",
      "    PRINT counter\n",
      "    ADD 5 to counter\n",
      "END WHILE<end_of_turn><eos>\n",
      "<bos><start_of_turn>user\n",
      "Generate a list of questions about a given real world topics.\n",
      "Apple's products<end_of_turn>\n",
      "<start_of_turn>model\n",
      "1. What are the different products that Apple offers?\n",
      "2. How have Apple's products evolved over time?\n",
      "3. What distinguishes Apple's products from those of its competitors?\n",
      "4. How have Apple's new products changed the tech industry?\n",
      "5. What are the pros and cons of Apple's products?\n",
      "6. What cutting-edge features does Apple's products offer?\n",
      "7. How does Apple ensure that their products are secure?\n",
      "8. How does Apple's design aesthetic influence their products?\n",
      "9. What impact have Apple's products had on society?\n",
      "10. What strategies does Apple use to promote its products?<end_of_turn><eos>\n",
      "<bos><start_of_turn>user\n",
      "Name a tool for connecting with customers on social media.\n",
      "<end_of_turn>\n",
      "<start_of_turn>model\n",
      "Hootsuite.<end_of_turn><eos>\n",
      "<bos><start_of_turn>user\n",
      "Develop a plan for preparing a turkey for Thanksgiving.\n",
      "<end_of_turn>\n",
      "<start_of_turn>model\n",
      "Plan for preparing a turkey for Thanksgiving:\n",
      "- Thaw turkey in fridge\n",
      "- Preheat oven to 375°F\n",
      "- Rub turkey with oil, salt and pepper\n",
      "- Place in roasting pan with vegetables\n",
      "- Insert meat thermometer into breast\n",
      "- Bake according to time and temperature recommendations\n",
      "- Let rest before carving and serving<end_of_turn><eos>\n",
      "<bos><start_of_turn>user\n",
      "outline the steps for teaching a robot to sort coins\n",
      "<end_of_turn>\n",
      "<start_of_turn>model\n",
      "The steps for teaching a robot to sort coins are as follows:\n",
      "1. Design a dataset of coins with different sizes, shapes, and weights to use for training the robot. \n",
      "2. Develop a feature-extraction algorithm to detect properties of each coin, such as size and shape.\n",
      "3. Develop a classification algorithm to differentiate between types of coins, such as penny, nickel, dime, and quarter.\n",
      "4. Train the robot using supervised machine learning and the dataset of coins.\n",
      "5. Test the robot’s accuracy and refine the algorithm if necessary.\n",
      "6. Program the robot with the final algorithm for sorting coins.<end_of_turn><eos>\n",
      "<bos><start_of_turn>user\n",
      "Explain a current trend in the given field.\n",
      "Technology<end_of_turn>\n",
      "<start_of_turn>model\n",
      "One of the big trends in technology today is artificial intelligence (AI). AI allows machines to learn and process tasks that would otherwise be too complex for a human being. It has revolutionized many industries, from healthcare to computing. AI-powered chatbots are being used to streamline customer service, while autonomous vehicles have the potential to drastically change transportation. AI is also being used to generate insights in the fields of finance, marketing, and even art. This versatile technology will continue to develop and revolutionize the way we interact with technology in the future.<end_of_turn><eos>\n",
      "<bos><start_of_turn>user\n",
      "Describe the impact of new technologies on the market.\n",
      "<end_of_turn>\n",
      "<start_of_turn>model\n",
      "New technologies have transformed the marketplace in many ways. They have enabled businesses to reach a larger customer base and helped them to access new markets, which has resulted in increased competition and lowered prices. They’ve also enabled greater specialization and helped businesses to target their services and products more effectively, leading to better customer service and improved innovation.<end_of_turn><eos>\n",
      "<bos><start_of_turn>user\n",
      "Create an input to the following command\n",
      "Please create a file<end_of_turn>\n",
      "<start_of_turn>model\n",
      "\"Create a new file named \"example.txt\" in the root directory\".<end_of_turn><eos>\n",
      "<bos><start_of_turn>user\n",
      "List a few characteristics of a volcano\n",
      "<end_of_turn>\n",
      "<start_of_turn>model\n",
      "The primary characteristics of a volcano include the crater, vent, or magma chamber, a conduit filled with magma, and a surrounding area consisting of rock fragments, ash, lava flows, and gas emissions. Additionally, volcanoes may have a lava dome, lava curtain, fumaroles, and hydrothermal activity. Volcanoes also vary in size, shape, and type.<end_of_turn><eos>\n",
      "<bos><start_of_turn>user\n",
      "Generate 5 interesting pieces of trivia.\n",
      "<end_of_turn>\n",
      "<start_of_turn>model\n",
      "1. The world's smallest mammal is the Bumblebee Bat, which is just 1.5 inches in length.\n",
      "2. The Mongolian language has more than 130 words for the color blue.\n",
      "3. The Great Barrier Reef is the world’s largest coral reef system and can be seen from outer space.\n",
      "4. It rains diamonds on Saturn and Jupiter.\n",
      "5. The odds of tossing a coin and getting all heads are 1 in a 256.<end_of_turn><eos>\n",
      "<bos><start_of_turn>user\n",
      "Explain why conservation of\n",
      "06/02 03:40:47 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - before_train in EvaluateChatHook.\n",
      "06/02 03:40:51 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Sample output:\n",
      "<bos><start_of_turn>system\n",
      "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n",
      "<end_of_turn>\n",
      "<start_of_turn>user\n",
      "请给我介绍五个上海的景点<end_of_turn>\n",
      "<start_of_turn>model\n",
      "好的，以下五个是上海的景点：\n",
      "\n",
      "1. 上海故宫\n",
      "2. 上海博物馆\n",
      "3. 上海虹桥\n",
      "4. 上海金山寺\n",
      "5. 上海南山公园<eos>\n",
      "\n",
      "06/02 03:40:54 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Sample output:\n",
      "<bos><start_of_turn>system\n",
      "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n",
      "<end_of_turn>\n",
      "<start_of_turn>user\n",
      "Please tell me five scenic spots in Shanghai<end_of_turn>\n",
      "<start_of_turn>model\n",
      "1. **Yu Garden**\n",
      "2. **Nanputuo Temple**\n",
      "3. **The Bund**\n",
      "4. **Huangpu Island**\n",
      "5. **Shengsi Island**<eos>\n",
      "\n",
      "06/02 03:40:54 - mmengine - \u001b[5m\u001b[4m\u001b[33mWARNING\u001b[0m - \"FileClient\" will be deprecated in future. Please use io functions in https://mmengine.readthedocs.io/en/latest/api/fileio.html#file-io\n",
      "06/02 03:40:54 - mmengine - \u001b[5m\u001b[4m\u001b[33mWARNING\u001b[0m - \"HardDiskBackend\" is the alias of \"LocalBackend\" and the former will be deprecated in future.\n",
      "06/02 03:40:54 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Checkpoints will be saved to /content/work_dirs/gemma_2b_it_qlora_alpaca_e3.\n",
      "/usr/local/lib/python3.10/dist-packages/mmengine/optim/scheduler/param_scheduler.py:198: UserWarning: Detected call of `scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the parameter value schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate\n",
      "  warnings.warn(\n",
      "06/02 03:40:59 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [  10/6543]  lr: 9.2327e-06  eta: 0:55:12  time: 0.5070  data_time: 0.0080  memory: 10910  loss: 2.9300\n",
      "06/02 03:41:04 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [  20/6543]  lr: 1.9489e-05  eta: 0:55:22  time: 0.5117  data_time: 0.0081  memory: 11508  loss: 2.7452  grad_norm: 0.7290\n",
      "06/02 03:41:09 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [  30/6543]  lr: 2.9745e-05  eta: 0:54:38  time: 0.4914  data_time: 0.0078  memory: 11508  loss: 2.8962  grad_norm: 0.7290\n",
      "06/02 03:41:14 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [  40/6543]  lr: 4.0002e-05  eta: 0:54:15  time: 0.4924  data_time: 0.0080  memory: 11508  loss: 2.7766  grad_norm: 0.7173\n",
      "06/02 03:41:19 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [  50/6543]  lr: 5.0258e-05  eta: 0:54:00  time: 0.4925  data_time: 0.0079  memory: 11508  loss: 2.6165  grad_norm: 0.7020\n",
      "06/02 03:41:24 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [  60/6543]  lr: 6.0514e-05  eta: 0:53:46  time: 0.4915  data_time: 0.0078  memory: 11508  loss: 2.6166  grad_norm: 0.7020\n",
      "06/02 03:41:29 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [  70/6543]  lr: 7.0771e-05  eta: 0:53:36  time: 0.4923  data_time: 0.0079  memory: 11508  loss: 2.7093  grad_norm: 0.6951\n",
      "06/02 03:41:34 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [  80/6543]  lr: 8.1027e-05  eta: 0:53:28  time: 0.4923  data_time: 0.0079  memory: 11508  loss: 2.5303  grad_norm: 0.6960\n",
      "06/02 03:41:38 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [  90/6543]  lr: 9.1283e-05  eta: 0:53:19  time: 0.4916  data_time: 0.0079  memory: 11508  loss: 2.3651  grad_norm: 0.6960\n",
      "06/02 03:41:43 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 100/6543]  lr: 1.0154e-04  eta: 0:53:12  time: 0.4924  data_time: 0.0077  memory: 11508  loss: 2.5326  grad_norm: 0.6779\n",
      "06/02 03:41:48 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 110/6543]  lr: 1.1180e-04  eta: 0:53:05  time: 0.4919  data_time: 0.0080  memory: 11508  loss: 2.3739  grad_norm: 0.6779\n",
      "06/02 03:41:53 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 120/6543]  lr: 1.2205e-04  eta: 0:52:59  time: 0.4924  data_time: 0.0079  memory: 11508  loss: 2.4133  grad_norm: 0.6568\n",
      "06/02 03:41:58 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 130/6543]  lr: 1.3231e-04  eta: 0:52:53  time: 0.4932  data_time: 0.0087  memory: 11508  loss: 2.2508  grad_norm: 0.6879\n",
      "06/02 03:42:03 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 140/6543]  lr: 1.4256e-04  eta: 0:52:47  time: 0.4921  data_time: 0.0080  memory: 11508  loss: 2.3256  grad_norm: 0.6879\n",
      "06/02 03:42:08 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 150/6543]  lr: 1.5282e-04  eta: 0:52:41  time: 0.4923  data_time: 0.0077  memory: 11508  loss: 2.2082  grad_norm: 0.7682\n",
      "06/02 03:42:13 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 160/6543]  lr: 1.6308e-04  eta: 0:52:35  time: 0.4926  data_time: 0.0078  memory: 11508  loss: 2.1557  grad_norm: 0.7858\n",
      "06/02 03:42:18 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 170/6543]  lr: 1.7333e-04  eta: 0:52:29  time: 0.4917  data_time: 0.0079  memory: 11508  loss: 2.1435  grad_norm: 0.7858\n",
      "06/02 03:42:23 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 180/6543]  lr: 1.8359e-04  eta: 0:52:23  time: 0.4926  data_time: 0.0078  memory: 11508  loss: 2.0410  grad_norm: 0.7496\n",
      "06/02 03:42:28 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 190/6543]  lr: 1.9385e-04  eta: 0:52:18  time: 0.4921  data_time: 0.0080  memory: 11508  loss: 2.0182  grad_norm: 0.7496\n",
      "06/02 03:42:33 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 200/6543]  lr: 2.0000e-04  eta: 0:52:12  time: 0.4924  data_time: 0.0078  memory: 11508  loss: 1.9503  grad_norm: 0.7131\n",
      "06/02 03:42:38 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 210/6543]  lr: 2.0000e-04  eta: 0:52:07  time: 0.4925  data_time: 0.0079  memory: 11508  loss: 1.9286  grad_norm: 0.6786\n",
      "06/02 03:42:42 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 220/6543]  lr: 1.9999e-04  eta: 0:52:01  time: 0.4916  data_time: 0.0077  memory: 11508  loss: 1.8610  grad_norm: 0.6786\n",
      "06/02 03:42:47 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 230/6543]  lr: 1.9999e-04  eta: 0:51:56  time: 0.4931  data_time: 0.0078  memory: 11508  loss: 1.9621  grad_norm: 0.6424\n",
      "06/02 03:42:52 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 240/6543]  lr: 1.9998e-04  eta: 0:51:51  time: 0.4926  data_time: 0.0078  memory: 11508  loss: 1.8921  grad_norm: 0.6018\n",
      "06/02 03:42:57 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 250/6543]  lr: 1.9997e-04  eta: 0:51:46  time: 0.4913  data_time: 0.0077  memory: 11508  loss: 1.9898  grad_norm: 0.6018\n",
      "06/02 03:43:02 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 260/6543]  lr: 1.9995e-04  eta: 0:51:41  time: 0.4932  data_time: 0.0080  memory: 11508  loss: 1.8048  grad_norm: 0.5725\n",
      "06/02 03:43:07 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 270/6543]  lr: 1.9993e-04  eta: 0:51:35  time: 0.4919  data_time: 0.0080  memory: 11508  loss: 1.7975  grad_norm: 0.5725\n",
      "06/02 03:43:12 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 280/6543]  lr: 1.9992e-04  eta: 0:51:30  time: 0.4925  data_time: 0.0078  memory: 11508  loss: 1.6763  grad_norm: 0.5503\n",
      "06/02 03:43:17 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 290/6543]  lr: 1.9989e-04  eta: 0:51:25  time: 0.4923  data_time: 0.0077  memory: 11508  loss: 1.7768  grad_norm: 0.4831\n",
      "06/02 03:43:22 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 300/6543]  lr: 1.9987e-04  eta: 0:51:20  time: 0.4917  data_time: 0.0077  memory: 11508  loss: 1.7964  grad_norm: 0.4831\n",
      "06/02 03:43:27 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 310/6543]  lr: 1.9984e-04  eta: 0:51:15  time: 0.4928  data_time: 0.0079  memory: 11508  loss: 1.7690  grad_norm: 0.3635\n",
      "06/02 03:43:32 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 320/6543]  lr: 1.9981e-04  eta: 0:51:10  time: 0.4927  data_time: 0.0079  memory: 11508  loss: 1.7936  grad_norm: 0.2894\n",
      "06/02 03:43:37 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 330/6543]  lr: 1.9978e-04  eta: 0:51:04  time: 0.4919  data_time: 0.0079  memory: 11508  loss: 1.7931  grad_norm: 0.2894\n",
      "06/02 03:43:42 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 340/6543]  lr: 1.9975e-04  eta: 0:50:59  time: 0.4925  data_time: 0.0079  memory: 11508  loss: 1.6962  grad_norm: 0.2802\n",
      "06/02 03:43:46 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 350/6543]  lr: 1.9971e-04  eta: 0:50:54  time: 0.4920  data_time: 0.0079  memory: 11508  loss: 1.7589  grad_norm: 0.2802\n",
      "06/02 03:43:51 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 360/6543]  lr: 1.9967e-04  eta: 0:50:49  time: 0.4925  data_time: 0.0078  memory: 11508  loss: 1.7272  grad_norm: 0.2715\n",
      "06/02 03:43:56 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 370/6543]  lr: 1.9963e-04  eta: 0:50:44  time: 0.4922  data_time: 0.0077  memory: 11508  loss: 1.6756  grad_norm: 0.2585\n",
      "06/02 03:44:01 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 380/6543]  lr: 1.9959e-04  eta: 0:50:39  time: 0.4918  data_time: 0.0078  memory: 11508  loss: 1.6761  grad_norm: 0.2585\n",
      "06/02 03:44:06 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 390/6543]  lr: 1.9954e-04  eta: 0:50:34  time: 0.4926  data_time: 0.0079  memory: 11508  loss: 1.7016  grad_norm: 0.2464\n",
      "06/02 03:44:11 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 400/6543]  lr: 1.9950e-04  eta: 0:50:29  time: 0.4925  data_time: 0.0079  memory: 11508  loss: 1.6812  grad_norm: 0.2417\n",
      "06/02 03:44:16 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 410/6543]  lr: 1.9944e-04  eta: 0:50:24  time: 0.4914  data_time: 0.0077  memory: 11508  loss: 1.6650  grad_norm: 0.2417\n",
      "06/02 03:44:21 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 420/6543]  lr: 1.9939e-04  eta: 0:50:18  time: 0.4924  data_time: 0.0077  memory: 11508  loss: 1.7206  grad_norm: 0.2330\n",
      "06/02 03:44:26 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 430/6543]  lr: 1.9934e-04  eta: 0:50:13  time: 0.4919  data_time: 0.0079  memory: 11508  loss: 1.6426  grad_norm: 0.2330\n",
      "06/02 03:44:31 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 440/6543]  lr: 1.9928e-04  eta: 0:50:08  time: 0.4925  data_time: 0.0078  memory: 11508  loss: 1.5951  grad_norm: 0.2184\n",
      "06/02 03:44:36 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 450/6543]  lr: 1.9922e-04  eta: 0:50:03  time: 0.4925  data_time: 0.0078  memory: 11508  loss: 1.5709  grad_norm: 0.2168\n",
      "06/02 03:44:41 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 460/6543]  lr: 1.9915e-04  eta: 0:49:58  time: 0.4917  data_time: 0.0078  memory: 11508  loss: 1.5605  grad_norm: 0.2168\n",
      "06/02 03:44:46 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 470/6543]  lr: 1.9909e-04  eta: 0:49:53  time: 0.4925  data_time: 0.0079  memory: 11508  loss: 1.6093  grad_norm: 0.2131\n",
      "06/02 03:44:50 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 480/6543]  lr: 1.9902e-04  eta: 0:49:48  time: 0.4925  data_time: 0.0078  memory: 11508  loss: 1.6858  grad_norm: 0.2115\n",
      "06/02 03:44:55 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 490/6543]  lr: 1.9895e-04  eta: 0:49:43  time: 0.4916  data_time: 0.0078  memory: 11508  loss: 1.6414  grad_norm: 0.2115\n",
      "06/02 03:45:00 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 500/6543]  lr: 1.9888e-04  eta: 0:49:38  time: 0.4926  data_time: 0.0079  memory: 11508  loss: 1.6135  grad_norm: 0.2006\n",
      "06/02 03:45:00 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - after_train_iter in EvaluateChatHook.\n",
      "06/02 03:45:02 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Sample output:\n",
      "<bos><start_of_turn>system\n",
      "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n",
      "<end_of_turn>\n",
      "<start_of_turn>user\n",
      "请给我介绍五个上海的景点<end_of_turn>\n",
      "<start_of_turn>model\n",
      "1. 上海故宫\n",
      "2. 上海博物馆\n",
      "3. 上海博物馆\n",
      "4. 上海博物馆\n",
      "5. 上海博物馆<eos>\n",
      "\n",
      "06/02 03:45:06 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Sample output:\n",
      "<bos><start_of_turn>system\n",
      "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n",
      "<end_of_turn>\n",
      "<start_of_turn>user\n",
      "Please tell me five scenic spots in Shanghai<end_of_turn>\n",
      "<start_of_turn>model\n",
      "1. The Bund\n",
      "2. The Oriental Pearl Tower\n",
      "3. The Shanghai Tower\n",
      "4. The People's Square\n",
      "5. The Yu Garden\n",
      "\n",
      "These spots offer stunning views of the city and are a great way to experience the beauty of Shanghai. parcs<eos>\n",
      "\n",
      "06/02 03:45:06 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Saving checkpoint at 500 iterations\n",
      "06/02 03:45:14 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 510/6543]  lr: 1.9880e-04  eta: 0:51:15  time: 1.3525  data_time: 0.8650  memory: 11508  loss: 1.4946  grad_norm: 0.2006\n",
      "06/02 03:45:19 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 520/6543]  lr: 1.9872e-04  eta: 0:51:08  time: 0.4928  data_time: 0.0079  memory: 11508  loss: 1.6221  grad_norm: 0.1925\n",
      "06/02 03:45:24 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 530/6543]  lr: 1.9864e-04  eta: 0:51:01  time: 0.4924  data_time: 0.0078  memory: 11508  loss: 1.5395  grad_norm: 0.1895\n",
      "06/02 03:45:29 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 540/6543]  lr: 1.9856e-04  eta: 0:50:54  time: 0.4918  data_time: 0.0078  memory: 11508  loss: 1.5888  grad_norm: 0.1895\n",
      "06/02 03:45:34 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 550/6543]  lr: 1.9848e-04  eta: 0:50:47  time: 0.4925  data_time: 0.0079  memory: 11508  loss: 1.4811  grad_norm: 0.1873\n",
      "06/02 03:45:38 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 560/6543]  lr: 1.9839e-04  eta: 0:50:40  time: 0.4927  data_time: 0.0078  memory: 11508  loss: 1.5247  grad_norm: 0.1851\n",
      "06/02 03:45:43 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 570/6543]  lr: 1.9830e-04  eta: 0:50:33  time: 0.4917  data_time: 0.0078  memory: 11508  loss: 1.5286  grad_norm: 0.1851\n",
      "06/02 03:45:48 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 580/6543]  lr: 1.9821e-04  eta: 0:50:27  time: 0.4926  data_time: 0.0079  memory: 11508  loss: 1.5043  grad_norm: 0.1775\n",
      "06/02 03:45:53 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 590/6543]  lr: 1.9811e-04  eta: 0:50:20  time: 0.4918  data_time: 0.0077  memory: 11508  loss: 1.5938  grad_norm: 0.1775\n",
      "06/02 03:45:58 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 600/6543]  lr: 1.9802e-04  eta: 0:50:14  time: 0.4926  data_time: 0.0079  memory: 11508  loss: 1.5329  grad_norm: 0.1813\n",
      "06/02 03:46:03 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 610/6543]  lr: 1.9792e-04  eta: 0:50:07  time: 0.4925  data_time: 0.0078  memory: 11508  loss: 1.6136  grad_norm: 0.1735\n",
      "06/02 03:46:08 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 620/6543]  lr: 1.9782e-04  eta: 0:50:01  time: 0.4926  data_time: 0.0079  memory: 11508  loss: 1.5828  grad_norm: 0.1735\n",
      "06/02 03:46:13 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 630/6543]  lr: 1.9771e-04  eta: 0:49:54  time: 0.4925  data_time: 0.0079  memory: 11508  loss: 1.5379  grad_norm: 0.1730\n",
      "06/02 03:46:18 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 640/6543]  lr: 1.9761e-04  eta: 0:49:48  time: 0.4928  data_time: 0.0079  memory: 11508  loss: 1.5368  grad_norm: 0.1677\n",
      "06/02 03:46:23 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 650/6543]  lr: 1.9750e-04  eta: 0:49:41  time: 0.4917  data_time: 0.0078  memory: 11508  loss: 1.5198  grad_norm: 0.1677\n",
      "06/02 03:46:28 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 660/6543]  lr: 1.9739e-04  eta: 0:49:35  time: 0.4924  data_time: 0.0077  memory: 11508  loss: 1.5456  grad_norm: 0.1658\n",
      "06/02 03:46:33 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 670/6543]  lr: 1.9727e-04  eta: 0:49:29  time: 0.4920  data_time: 0.0080  memory: 11508  loss: 1.5068  grad_norm: 0.1658\n",
      "06/02 03:46:38 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 680/6543]  lr: 1.9716e-04  eta: 0:49:23  time: 0.4924  data_time: 0.0078  memory: 11508  loss: 1.4651  grad_norm: 0.1621\n",
      "06/02 03:46:42 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 690/6543]  lr: 1.9704e-04  eta: 0:49:17  time: 0.4926  data_time: 0.0079  memory: 11508  loss: 1.5114  grad_norm: 0.1601\n",
      "06/02 03:46:47 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 700/6543]  lr: 1.9692e-04  eta: 0:49:10  time: 0.4918  data_time: 0.0078  memory: 11508  loss: 1.4929  grad_norm: 0.1601\n",
      "06/02 03:46:52 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 710/6543]  lr: 1.9679e-04  eta: 0:49:04  time: 0.4922  data_time: 0.0077  memory: 11508  loss: 1.4655  grad_norm: 0.1560\n",
      "06/02 03:46:57 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 720/6543]  lr: 1.9667e-04  eta: 0:48:58  time: 0.4928  data_time: 0.0079  memory: 11508  loss: 1.5187  grad_norm: 0.1498\n",
      "06/02 03:47:02 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 730/6543]  lr: 1.9654e-04  eta: 0:48:52  time: 0.4917  data_time: 0.0077  memory: 11508  loss: 1.5252  grad_norm: 0.1498\n",
      "06/02 03:47:07 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 740/6543]  lr: 1.9641e-04  eta: 0:48:46  time: 0.4928  data_time: 0.0080  memory: 11508  loss: 1.4807  grad_norm: 0.1496\n",
      "06/02 03:47:12 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 750/6543]  lr: 1.9628e-04  eta: 0:48:40  time: 0.4926  data_time: 0.0078  memory: 11508  loss: 1.4889  grad_norm: 0.1496\n",
      "06/02 03:47:17 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 760/6543]  lr: 1.9614e-04  eta: 0:48:34  time: 0.4926  data_time: 0.0078  memory: 11508  loss: 1.4703  grad_norm: 0.1476\n",
      "06/02 03:47:22 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 770/6543]  lr: 1.9600e-04  eta: 0:48:28  time: 0.4925  data_time: 0.0078  memory: 11508  loss: 1.4873  grad_norm: 0.1465\n",
      "06/02 03:47:27 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 780/6543]  lr: 1.9587e-04  eta: 0:48:23  time: 0.4918  data_time: 0.0078  memory: 11508  loss: 1.5339  grad_norm: 0.1465\n",
      "06/02 03:47:32 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 790/6543]  lr: 1.9572e-04  eta: 0:48:17  time: 0.4925  data_time: 0.0078  memory: 11508  loss: 1.4493  grad_norm: 0.1465\n",
      "06/02 03:47:37 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 800/6543]  lr: 1.9558e-04  eta: 0:48:11  time: 0.4924  data_time: 0.0077  memory: 11508  loss: 1.5304  grad_norm: 0.1473\n",
      "06/02 03:47:42 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 810/6543]  lr: 1.9543e-04  eta: 0:48:05  time: 0.4917  data_time: 0.0078  memory: 11508  loss: 1.5206  grad_norm: 0.1473\n",
      "06/02 03:47:46 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 820/6543]  lr: 1.9528e-04  eta: 0:47:59  time: 0.4923  data_time: 0.0078  memory: 11508  loss: 1.5063  grad_norm: 0.1591\n",
      "06/02 03:47:51 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 830/6543]  lr: 1.9513e-04  eta: 0:47:53  time: 0.4918  data_time: 0.0077  memory: 11508  loss: 1.5201  grad_norm: 0.1591\n",
      "06/02 03:47:56 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 840/6543]  lr: 1.9498e-04  eta: 0:47:48  time: 0.4926  data_time: 0.0079  memory: 11508  loss: 1.4968  grad_norm: 0.2005\n",
      "06/02 03:48:01 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 850/6543]  lr: 1.9482e-04  eta: 0:47:42  time: 0.4923  data_time: 0.0078  memory: 11508  loss: 1.4899  grad_norm: 0.2120\n",
      "06/02 03:48:06 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 860/6543]  lr: 1.9466e-04  eta: 0:47:36  time: 0.4923  data_time: 0.0080  memory: 11508  loss: 1.4523  grad_norm: 0.2120\n",
      "06/02 03:48:11 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 870/6543]  lr: 1.9450e-04  eta: 0:47:30  time: 0.4924  data_time: 0.0078  memory: 11508  loss: 1.4808  grad_norm: 0.2130\n",
      "06/02 03:48:16 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 880/6543]  lr: 1.9434e-04  eta: 0:47:25  time: 0.4928  data_time: 0.0079  memory: 11508  loss: 1.5183  grad_norm: 0.2122\n",
      "06/02 03:48:21 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 890/6543]  lr: 1.9417e-04  eta: 0:47:19  time: 0.4917  data_time: 0.0079  memory: 11508  loss: 1.4241  grad_norm: 0.2122\n",
      "06/02 03:48:26 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 900/6543]  lr: 1.9401e-04  eta: 0:47:14  time: 0.4927  data_time: 0.0079  memory: 11508  loss: 1.4521  grad_norm: 0.2144\n",
      "06/02 03:48:31 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 910/6543]  lr: 1.9384e-04  eta: 0:47:08  time: 0.4923  data_time: 0.0080  memory: 11508  loss: 1.5048  grad_norm: 0.2144\n",
      "06/02 03:48:36 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 920/6543]  lr: 1.9366e-04  eta: 0:47:02  time: 0.4924  data_time: 0.0078  memory: 11508  loss: 1.4406  grad_norm: 0.2124\n",
      "06/02 03:48:41 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 930/6543]  lr: 1.9349e-04  eta: 0:46:57  time: 0.4926  data_time: 0.0079  memory: 11508  loss: 1.4483  grad_norm: 0.2155\n",
      "06/02 03:48:46 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 940/6543]  lr: 1.9331e-04  eta: 0:46:51  time: 0.4919  data_time: 0.0079  memory: 11508  loss: 1.4120  grad_norm: 0.2155\n",
      "06/02 03:48:50 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 950/6543]  lr: 1.9313e-04  eta: 0:46:45  time: 0.4923  data_time: 0.0078  memory: 11508  loss: 1.4787  grad_norm: 0.2132\n",
      "06/02 03:48:55 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 960/6543]  lr: 1.9295e-04  eta: 0:46:40  time: 0.4930  data_time: 0.0080  memory: 11508  loss: 1.4060  grad_norm: 0.2204\n",
      "06/02 03:49:00 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 970/6543]  lr: 1.9277e-04  eta: 0:46:34  time: 0.4916  data_time: 0.0077  memory: 11508  loss: 1.4994  grad_norm: 0.2204\n",
      "06/02 03:49:05 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 980/6543]  lr: 1.9258e-04  eta: 0:46:29  time: 0.4927  data_time: 0.0080  memory: 11508  loss: 1.5139  grad_norm: 0.2086\n",
      "06/02 03:49:10 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [ 990/6543]  lr: 1.9240e-04  eta: 0:46:23  time: 0.4917  data_time: 0.0078  memory: 11508  loss: 1.4952  grad_norm: 0.2086\n",
      "06/02 03:49:15 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Exp name: gemma_2b_it_qlora_alpaca_e3_20240602_034031\n",
      "06/02 03:49:15 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1000/6543]  lr: 1.9220e-04  eta: 0:46:18  time: 0.4926  data_time: 0.0079  memory: 11508  loss: 1.4506  grad_norm: 0.1758\n",
      "06/02 03:49:15 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - after_train_iter in EvaluateChatHook.\n",
      "06/02 03:49:59 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Sample output:\n",
      "<bos><start_of_turn>system\n",
      "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n",
      "<end_of_turn>\n",
      "<start_of_turn>user\n",
      "请给我介绍五个上海的景点<end_of_turn>\n",
      "<start_of_turn>model\n",
      "1. 上海故宫\n",
      "2. 上海博物馆\n",
      "3. 上海博物馆\n",
      "4. 上海博物馆\n",
      "5. 上海博物馆 \n",
      "\n",
      "上海的景点是上海历史文化博物馆，是上海四大博物馆之一，收藏了中国古代文物。\n",
      "\n",
      "上海故宫是上海四大博物馆之一，收藏了中国古代文物。\n",
      "\n",
      "上海博物馆是上海四大博物馆之一，收藏了中国古代文物。\n",
      "\n",
      "上海博物馆是上海四大博物馆之一，收藏了中国古代文物。\n",
      "\n",
      "上海博物馆是上海四大博物馆之一，收藏了中国古代文物。\n",
      "\n",
      "上海故宫是上海四大博物馆之一，收藏了中国古代文物。\n",
      "\n",
      "上海博物馆是上海四大博物馆之一，收藏了中国古代文物。\n",
      "\n",
      "上海博物馆是上海四大博物馆之一，收藏了中国古代文物。\n",
      "\n",
      "上海故宫是上海四大博物馆之一，收藏了中国古代文物。\n",
      "\n",
      "上海博物馆是上海四大博物馆之一，收藏了中国古代文物。\n",
      "\n",
      "上海博物馆是上海四大博物馆之一，收藏了中国古代文物。\n",
      "\n",
      "上海故宫是上海四大博物馆之一，收藏了中国古代文物。\n",
      "\n",
      "上海博物馆是上海四大博物馆之一，收藏了中国古代文物。\n",
      "\n",
      "上海博物馆是上海四大博物馆之一，收藏了中国古代文物。\n",
      "\n",
      "上海博物馆是上海四大博物馆之一，收藏了中国古代文物。\n",
      "\n",
      "上海博物馆是上海四大博物馆之一，收藏了中国古代文物。\n",
      "\n",
      "上海博物馆是上海四大博物馆之一，收藏了中国古代文物。\n",
      "\n",
      "上海博物馆是上海四大博物馆之一，收藏了中国古代文物。\n",
      "\n",
      "上海博物馆是上海四大博物馆之一，收藏了中国古代文物。\n",
      "\n",
      "上海博物馆是上海四大博物馆之一，收藏了中国古代文物。\n",
      "\n",
      "上海博物馆是上海四大博物馆之一，收藏了中国古代文物。\n",
      "\n",
      "上海博物馆是上海四大博物馆之一，收藏了中国古代文物。\n",
      "\n",
      "上海博物馆是上海四大博物馆之一，收藏了中国古代文物。\n",
      "\n",
      "上海博物馆是上海四大博物馆之一，收藏了中国古代文物。\n",
      "\n",
      "上海博物馆是上海市博物馆，是上海市文物保护单位，收藏了中国古代文物。\n",
      "\n",
      "上海博物馆是上海市博物馆，是上海市文物保护单位，收藏了中国古代文物。\n",
      "\n",
      "上海博物馆是上海市博物馆，收藏了中国古代文物，是上海市重要的文化教育资源。\n",
      "\n",
      "上海博物馆是上海市重要的文化教育资源，收藏了中国古代文物，是上海市重要的文化教育资源。\n",
      "\n",
      "上海博物馆是上海市重要的文化教育资源，收藏了中国古代文物，是上海市重要的文化教育资源。\n",
      "\n",
      "上海博物馆是上海市重要的文化教育资源，收藏了中国古代文物，是上海市重要的文化教育资源。\n",
      "\n",
      "上海博物馆是上海市重要的文化教育资源，收藏了中国古代文物，是上海市重要的文化教育资源。\n",
      "\n",
      "上海博物馆是上海市重要的文化教育资源，收藏了中国古代文物，是上海市重要的文化教育资源。\n",
      "\n",
      "上海博物馆是上海市重要的文化教育资源，收藏了中国\n",
      "\n",
      "06/02 03:50:01 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Sample output:\n",
      "<bos><start_of_turn>system\n",
      "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n",
      "<end_of_turn>\n",
      "<start_of_turn>user\n",
      "Please tell me five scenic spots in Shanghai<end_of_turn>\n",
      "<start_of_turn>model\n",
      "1. The Bund\n",
      "2. The Oriental Pearl Tower\n",
      "3. The Yu Garden\n",
      "4. The Shanghai Museum\n",
      "5. The People's Square<eos>\n",
      "\n",
      "06/02 03:50:01 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Saving checkpoint at 1000 iterations\n",
      "06/02 03:50:08 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1010/6543]  lr: 1.9201e-04  eta: 0:50:38  time: 5.3335  data_time: 4.8454  memory: 11508  loss: 1.4425  grad_norm: 0.1630\n",
      "06/02 03:50:13 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1020/6543]  lr: 1.9182e-04  eta: 0:50:29  time: 0.4915  data_time: 0.0078  memory: 11508  loss: 1.4466  grad_norm: 0.1630\n",
      "06/02 03:50:18 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1030/6543]  lr: 1.9162e-04  eta: 0:50:20  time: 0.4923  data_time: 0.0079  memory: 11508  loss: 1.5103  grad_norm: 0.1678\n",
      "06/02 03:50:23 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1040/6543]  lr: 1.9142e-04  eta: 0:50:12  time: 0.4924  data_time: 0.0078  memory: 11508  loss: 1.3785  grad_norm: 0.1702\n",
      "06/02 03:50:28 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1050/6543]  lr: 1.9122e-04  eta: 0:50:04  time: 0.4919  data_time: 0.0080  memory: 11508  loss: 1.4385  grad_norm: 0.1702\n",
      "06/02 03:50:33 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1060/6543]  lr: 1.9101e-04  eta: 0:49:55  time: 0.4923  data_time: 0.0078  memory: 11508  loss: 1.3357  grad_norm: 0.1791\n",
      "06/02 03:50:38 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1070/6543]  lr: 1.9081e-04  eta: 0:49:47  time: 0.4918  data_time: 0.0078  memory: 11508  loss: 1.4881  grad_norm: 0.1791\n",
      "06/02 03:50:43 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1080/6543]  lr: 1.9060e-04  eta: 0:49:39  time: 0.4924  data_time: 0.0078  memory: 11508  loss: 1.4928  grad_norm: 0.1809\n",
      "06/02 03:50:48 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1090/6543]  lr: 1.9039e-04  eta: 0:49:31  time: 0.4923  data_time: 0.0078  memory: 11508  loss: 1.4542  grad_norm: 0.1852\n",
      "06/02 03:50:53 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1100/6543]  lr: 1.9018e-04  eta: 0:49:23  time: 0.4918  data_time: 0.0078  memory: 11508  loss: 1.4883  grad_norm: 0.1852\n",
      "06/02 03:50:58 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1110/6543]  lr: 1.8996e-04  eta: 0:49:15  time: 0.4923  data_time: 0.0078  memory: 11508  loss: 1.4965  grad_norm: 0.1903\n",
      "06/02 03:51:03 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1120/6543]  lr: 1.8974e-04  eta: 0:49:07  time: 0.4930  data_time: 0.0080  memory: 11508  loss: 1.3753  grad_norm: 0.1856\n",
      "06/02 03:51:07 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1130/6543]  lr: 1.8952e-04  eta: 0:48:59  time: 0.4917  data_time: 0.0078  memory: 11508  loss: 1.4171  grad_norm: 0.1856\n",
      "06/02 03:51:12 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1140/6543]  lr: 1.8930e-04  eta: 0:48:51  time: 0.4925  data_time: 0.0078  memory: 11508  loss: 1.3898  grad_norm: 0.1914\n",
      "06/02 03:51:17 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1150/6543]  lr: 1.8908e-04  eta: 0:48:43  time: 0.4927  data_time: 0.0082  memory: 11508  loss: 1.4847  grad_norm: 0.1914\n",
      "06/02 03:51:22 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1160/6543]  lr: 1.8885e-04  eta: 0:48:36  time: 0.4927  data_time: 0.0080  memory: 11508  loss: 1.4707  grad_norm: 0.1858\n",
      "06/02 03:51:27 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1170/6543]  lr: 1.8863e-04  eta: 0:48:28  time: 0.4928  data_time: 0.0080  memory: 11508  loss: 1.4629  grad_norm: 0.1916\n",
      "06/02 03:51:32 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1180/6543]  lr: 1.8839e-04  eta: 0:48:20  time: 0.4918  data_time: 0.0078  memory: 11508  loss: 1.4413  grad_norm: 0.1916\n",
      "06/02 03:51:37 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1190/6543]  lr: 1.8816e-04  eta: 0:48:13  time: 0.4927  data_time: 0.0079  memory: 11508  loss: 1.4657  grad_norm: 0.1874\n",
      "06/02 03:51:42 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1200/6543]  lr: 1.8793e-04  eta: 0:48:05  time: 0.4924  data_time: 0.0077  memory: 11508  loss: 1.4619  grad_norm: 0.1913\n",
      "06/02 03:51:47 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1210/6543]  lr: 1.8769e-04  eta: 0:47:58  time: 0.4917  data_time: 0.0078  memory: 11508  loss: 1.4351  grad_norm: 0.1913\n",
      "06/02 03:51:52 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1220/6543]  lr: 1.8745e-04  eta: 0:47:50  time: 0.4924  data_time: 0.0078  memory: 11508  loss: 1.4783  grad_norm: 0.1813\n",
      "06/02 03:51:57 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1230/6543]  lr: 1.8721e-04  eta: 0:47:43  time: 0.4920  data_time: 0.0078  memory: 11508  loss: 1.4787  grad_norm: 0.1813\n",
      "06/02 03:52:02 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1240/6543]  lr: 1.8697e-04  eta: 0:47:35  time: 0.4926  data_time: 0.0079  memory: 11508  loss: 1.4695  grad_norm: 0.1818\n",
      "06/02 03:52:07 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1250/6543]  lr: 1.8672e-04  eta: 0:47:28  time: 0.4925  data_time: 0.0079  memory: 11508  loss: 1.4462  grad_norm: 0.1772\n",
      "06/02 03:52:11 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1260/6543]  lr: 1.8647e-04  eta: 0:47:21  time: 0.4921  data_time: 0.0079  memory: 11508  loss: 1.5021  grad_norm: 0.1772\n",
      "06/02 03:52:16 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1270/6543]  lr: 1.8622e-04  eta: 0:47:13  time: 0.4924  data_time: 0.0078  memory: 11508  loss: 1.3749  grad_norm: 0.1750\n",
      "06/02 03:52:21 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1280/6543]  lr: 1.8597e-04  eta: 0:47:06  time: 0.4927  data_time: 0.0078  memory: 11508  loss: 1.3639  grad_norm: 0.1738\n",
      "06/02 03:52:26 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1290/6543]  lr: 1.8572e-04  eta: 0:46:59  time: 0.4919  data_time: 0.0080  memory: 11508  loss: 1.4359  grad_norm: 0.1738\n",
      "06/02 03:52:31 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1300/6543]  lr: 1.8546e-04  eta: 0:46:52  time: 0.4923  data_time: 0.0077  memory: 11508  loss: 1.4359  grad_norm: 0.1679\n",
      "06/02 03:52:36 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1310/6543]  lr: 1.8521e-04  eta: 0:46:45  time: 0.4922  data_time: 0.0080  memory: 11508  loss: 1.4841  grad_norm: 0.1679\n",
      "06/02 03:52:41 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1320/6543]  lr: 1.8494e-04  eta: 0:46:38  time: 0.4923  data_time: 0.0077  memory: 11508  loss: 1.5089  grad_norm: 0.1702\n",
      "06/02 03:52:46 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1330/6543]  lr: 1.8468e-04  eta: 0:46:31  time: 0.4925  data_time: 0.0078  memory: 11508  loss: 1.3994  grad_norm: 0.1680\n",
      "06/02 03:52:51 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1340/6543]  lr: 1.8442e-04  eta: 0:46:24  time: 0.4920  data_time: 0.0080  memory: 11508  loss: 1.4250  grad_norm: 0.1680\n",
      "06/02 03:52:56 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1350/6543]  lr: 1.8415e-04  eta: 0:46:17  time: 0.4924  data_time: 0.0078  memory: 11508  loss: 1.4244  grad_norm: 0.1694\n",
      "06/02 03:53:01 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1360/6543]  lr: 1.8388e-04  eta: 0:46:10  time: 0.4939  data_time: 0.0080  memory: 11508  loss: 1.4382  grad_norm: 0.1620\n",
      "06/02 03:53:06 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1370/6543]  lr: 1.8361e-04  eta: 0:46:03  time: 0.4919  data_time: 0.0079  memory: 11508  loss: 1.4170  grad_norm: 0.1620\n",
      "06/02 03:53:11 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1380/6543]  lr: 1.8334e-04  eta: 0:45:56  time: 0.4925  data_time: 0.0078  memory: 11508  loss: 1.4492  grad_norm: 0.1639\n",
      "06/02 03:53:16 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1390/6543]  lr: 1.8307e-04  eta: 0:45:49  time: 0.4917  data_time: 0.0077  memory: 11508  loss: 1.4120  grad_norm: 0.1639\n",
      "06/02 03:53:20 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1400/6543]  lr: 1.8279e-04  eta: 0:45:42  time: 0.4923  data_time: 0.0078  memory: 11508  loss: 1.3327  grad_norm: 0.1661\n",
      "06/02 03:53:25 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1410/6543]  lr: 1.8251e-04  eta: 0:45:35  time: 0.4930  data_time: 0.0080  memory: 11508  loss: 1.3637  grad_norm: 0.1671\n",
      "06/02 03:53:30 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1420/6543]  lr: 1.8223e-04  eta: 0:45:28  time: 0.4916  data_time: 0.0077  memory: 11508  loss: 1.4912  grad_norm: 0.1671\n",
      "06/02 03:53:35 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1430/6543]  lr: 1.8195e-04  eta: 0:45:22  time: 0.4925  data_time: 0.0079  memory: 11508  loss: 1.4267  grad_norm: 0.1640\n",
      "06/02 03:53:40 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1440/6543]  lr: 1.8166e-04  eta: 0:45:15  time: 0.4925  data_time: 0.0078  memory: 11508  loss: 1.4607  grad_norm: 0.1649\n",
      "06/02 03:53:45 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1450/6543]  lr: 1.8138e-04  eta: 0:45:08  time: 0.4918  data_time: 0.0078  memory: 11508  loss: 1.4758  grad_norm: 0.1649\n",
      "06/02 03:53:50 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1460/6543]  lr: 1.8109e-04  eta: 0:45:01  time: 0.4924  data_time: 0.0078  memory: 11508  loss: 1.4228  grad_norm: 0.1644\n",
      "06/02 03:53:55 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1470/6543]  lr: 1.8080e-04  eta: 0:44:55  time: 0.4919  data_time: 0.0078  memory: 11508  loss: 1.3366  grad_norm: 0.1644\n",
      "06/02 03:54:00 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1480/6543]  lr: 1.8050e-04  eta: 0:44:48  time: 0.4929  data_time: 0.0080  memory: 11508  loss: 1.4245  grad_norm: 0.1598\n",
      "06/02 03:54:05 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1490/6543]  lr: 1.8021e-04  eta: 0:44:42  time: 0.4930  data_time: 0.0080  memory: 11508  loss: 1.4540  grad_norm: 0.1572\n",
      "06/02 03:54:10 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1500/6543]  lr: 1.7991e-04  eta: 0:44:35  time: 0.4924  data_time: 0.0081  memory: 11508  loss: 1.3602  grad_norm: 0.1572\n",
      "06/02 03:54:10 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - after_train_iter in EvaluateChatHook.\n",
      "06/02 03:54:54 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Sample output:\n",
      "<bos><start_of_turn>system\n",
      "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n",
      "<end_of_turn>\n",
      "<start_of_turn>user\n",
      "请给我介绍五个上海的景点<end_of_turn>\n",
      "<start_of_turn>model\n",
      "1. 上海故宫\n",
      "2. 上海博物馆\n",
      "3. 上海博物馆\n",
      "4. 上海博物馆\n",
      "5. 上海博物馆 \n",
      "\n",
      "上海故宫是上海最著名的景点，是世界上保存最完好的古代建筑群，是上海文化遗产。上海博物馆是上海最有名的博物馆，收藏了中国古代文物，是上海文化遗产。上海博物馆是上海最著名的博物馆，收藏了中国古代文物，是上海文化遗产。上海博物馆是上海最著名的博物馆，收藏了中国古代文物，是上海文化遗产。上海博物馆是上海最著名的博物馆，收藏了中国古代文物，是上海文化遗产。\n",
      "\n",
      "上海博物馆是上海最著名的博物馆，收藏了中国古代文物，是上海文化遗产。上海故宫是上海最著名的景点，是世界上保存最完好的古代建筑群，是上海文化遗产。上海博物馆是上海最著名的博物馆，收藏了中国古代文物，是上海文化遗产。上海博物馆是上海最著名的博物馆，收藏了中国古代文物，是上海文化遗产。\n",
      "\n",
      "上海博物馆是上海最著名的博物馆，收藏了中国古代文物，是上海文化遗产。上海故宫是上海最著名的景点，是世界上保存最完好的古代建筑群，是上海文化遗产。上海博物馆是上海最著名的博物馆，收藏了中国古代文物，是上海文化遗产。\n",
      "\n",
      "上海博物馆是上海最著名的博物馆，收藏了中国古代文物，是上海文化遗产。上海故宫是上海最著名的景点，是世界上保存最完好的古代建筑群，是上海文化遗产。上海博物馆是上海最著名的博物馆，收藏了中国古代文物，是上海文化遗产。\n",
      "\n",
      "上海博物馆是上海最著名的博物馆，收藏了中国古代文物，是上海文化遗产。上海故宫是上海最著名的景点，是世界上保存最完好的古代建筑群，是上海文化遗产。上海博物馆是上海最著名的博物馆，收藏了中国古代文物，是上海文化遗产。\n",
      "\n",
      "上海博物馆是上海最著名的博物馆，收藏了中国古代文物，是上海文化遗产。上海故宫是上海最著名的景点，是世界上保存最完好的古代建筑群，是上海文化遗产。上海博物馆是上海最著名的博物馆，收藏了中国古代文物，是上海文化遗产。\n",
      "\n",
      "上海博物馆是上海最著名的博物馆，收藏了中国古代文物，是上海文化遗产。上海故宫是上海最著名的景点，是世界上保存最完好的古代建筑群，是上海文化遗产。\n",
      "\n",
      "上海博物馆是上海最著名的博物馆，收藏了中国古代文物，是上海文化遗产。上海故宫是上海最著名的景点，是世界上保存最完好的古代建筑群，是上海文化遗产。\n",
      "\n",
      "上海博物馆是上海最著名的博物馆，收藏了中国古代文物，是上海文化遗产。\n",
      "\n",
      "上海博物馆是上海最著名的博物馆，收藏了中国古代文物，是上海文化遗产。\n",
      "\n",
      "上海博物馆是上海最著名的博物馆，收藏\n",
      "\n",
      "06/02 03:54:56 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Sample output:\n",
      "<bos><start_of_turn>system\n",
      "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n",
      "<end_of_turn>\n",
      "<start_of_turn>user\n",
      "Please tell me five scenic spots in Shanghai<end_of_turn>\n",
      "<start_of_turn>model\n",
      "1. The Bund\n",
      "2. The Oriental Pearl Tower\n",
      "3. The Shanghai Disneyland\n",
      "4. The Yu Garden\n",
      "5. The Suzhou Museum<end_of_turn>\n",
      "\n",
      "06/02 03:54:56 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Saving checkpoint at 1500 iterations\n",
      "06/02 03:55:03 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1510/6543]  lr: 1.7961e-04  eta: 0:47:10  time: 5.3406  data_time: 4.8528  memory: 11508  loss: 1.4734  grad_norm: 0.1554\n",
      "06/02 03:55:08 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1520/6543]  lr: 1.7931e-04  eta: 0:47:02  time: 0.4925  data_time: 0.0079  memory: 11508  loss: 1.4524  grad_norm: 0.1577\n",
      "06/02 03:55:13 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1530/6543]  lr: 1.7901e-04  eta: 0:46:54  time: 0.4915  data_time: 0.0078  memory: 11508  loss: 1.4442  grad_norm: 0.1577\n",
      "06/02 03:55:18 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1540/6543]  lr: 1.7871e-04  eta: 0:46:46  time: 0.4924  data_time: 0.0079  memory: 11508  loss: 1.3693  grad_norm: 0.1568\n",
      "06/02 03:55:23 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1550/6543]  lr: 1.7840e-04  eta: 0:46:38  time: 0.4919  data_time: 0.0079  memory: 11508  loss: 1.3966  grad_norm: 0.1568\n",
      "06/02 03:55:28 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1560/6543]  lr: 1.7809e-04  eta: 0:46:31  time: 0.4926  data_time: 0.0079  memory: 11508  loss: 1.3803  grad_norm: 0.1579\n",
      "06/02 03:55:33 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1570/6543]  lr: 1.7778e-04  eta: 0:46:23  time: 0.4928  data_time: 0.0080  memory: 11508  loss: 1.4511  grad_norm: 0.1600\n",
      "06/02 03:55:38 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1580/6543]  lr: 1.7747e-04  eta: 0:46:15  time: 0.4918  data_time: 0.0078  memory: 11508  loss: 1.3730  grad_norm: 0.1600\n",
      "06/02 03:55:42 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1590/6543]  lr: 1.7716e-04  eta: 0:46:08  time: 0.4927  data_time: 0.0080  memory: 11508  loss: 1.4587  grad_norm: 0.1614\n",
      "06/02 03:55:47 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1600/6543]  lr: 1.7684e-04  eta: 0:46:00  time: 0.4928  data_time: 0.0079  memory: 11508  loss: 1.4576  grad_norm: 0.1643\n",
      "06/02 03:55:52 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1610/6543]  lr: 1.7652e-04  eta: 0:45:52  time: 0.4917  data_time: 0.0077  memory: 11508  loss: 1.3895  grad_norm: 0.1643\n",
      "06/02 03:55:57 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1620/6543]  lr: 1.7620e-04  eta: 0:45:45  time: 0.4927  data_time: 0.0080  memory: 11508  loss: 1.4023  grad_norm: 0.1679\n",
      "06/02 03:56:02 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1630/6543]  lr: 1.7588e-04  eta: 0:45:37  time: 0.4920  data_time: 0.0079  memory: 11508  loss: 1.4085  grad_norm: 0.1679\n",
      "06/02 03:56:07 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1640/6543]  lr: 1.7556e-04  eta: 0:45:30  time: 0.4931  data_time: 0.0081  memory: 11508  loss: 1.3984  grad_norm: 0.1680\n",
      "06/02 03:56:12 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1650/6543]  lr: 1.7523e-04  eta: 0:45:22  time: 0.4925  data_time: 0.0079  memory: 11508  loss: 1.3946  grad_norm: 0.1701\n",
      "06/02 03:56:17 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1660/6543]  lr: 1.7491e-04  eta: 0:45:15  time: 0.4919  data_time: 0.0079  memory: 11508  loss: 1.4970  grad_norm: 0.1701\n",
      "06/02 03:56:22 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1670/6543]  lr: 1.7458e-04  eta: 0:45:07  time: 0.4927  data_time: 0.0080  memory: 11508  loss: 1.4619  grad_norm: 0.1719\n",
      "06/02 03:56:27 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1680/6543]  lr: 1.7425e-04  eta: 0:45:00  time: 0.4927  data_time: 0.0078  memory: 11508  loss: 1.3786  grad_norm: 0.1710\n",
      "06/02 03:56:32 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1690/6543]  lr: 1.7391e-04  eta: 0:44:53  time: 0.4928  data_time: 0.0080  memory: 11508  loss: 1.4470  grad_norm: 0.1710\n",
      "06/02 03:56:37 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1700/6543]  lr: 1.7358e-04  eta: 0:44:45  time: 0.4930  data_time: 0.0081  memory: 11508  loss: 1.4149  grad_norm: 0.1749\n",
      "06/02 03:56:42 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1710/6543]  lr: 1.7324e-04  eta: 0:44:38  time: 0.4924  data_time: 0.0081  memory: 11508  loss: 1.4189  grad_norm: 0.1749\n",
      "06/02 03:56:46 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1720/6543]  lr: 1.7291e-04  eta: 0:44:31  time: 0.4927  data_time: 0.0079  memory: 11508  loss: 1.4510  grad_norm: 0.1673\n",
      "06/02 03:56:51 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1730/6543]  lr: 1.7257e-04  eta: 0:44:23  time: 0.4924  data_time: 0.0078  memory: 11508  loss: 1.4744  grad_norm: 0.1640\n",
      "06/02 03:56:56 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1740/6543]  lr: 1.7223e-04  eta: 0:44:16  time: 0.4920  data_time: 0.0080  memory: 11508  loss: 1.4150  grad_norm: 0.1640\n",
      "06/02 03:57:01 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1750/6543]  lr: 1.7188e-04  eta: 0:44:09  time: 0.4924  data_time: 0.0077  memory: 11508  loss: 1.3495  grad_norm: 0.1649\n",
      "06/02 03:57:06 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1760/6543]  lr: 1.7154e-04  eta: 0:44:02  time: 0.4932  data_time: 0.0081  memory: 11508  loss: 1.3840  grad_norm: 0.1593\n",
      "06/02 03:57:11 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1770/6543]  lr: 1.7119e-04  eta: 0:43:55  time: 0.4918  data_time: 0.0079  memory: 11508  loss: 1.5036  grad_norm: 0.1593\n",
      "06/02 03:57:16 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1780/6543]  lr: 1.7084e-04  eta: 0:43:48  time: 0.4924  data_time: 0.0078  memory: 11508  loss: 1.3534  grad_norm: 0.1560\n",
      "06/02 03:57:21 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1790/6543]  lr: 1.7049e-04  eta: 0:43:40  time: 0.4919  data_time: 0.0078  memory: 11508  loss: 1.4820  grad_norm: 0.1560\n",
      "06/02 03:57:26 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1800/6543]  lr: 1.7014e-04  eta: 0:43:33  time: 0.4927  data_time: 0.0079  memory: 11508  loss: 1.3803  grad_norm: 0.1593\n",
      "06/02 03:57:31 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1810/6543]  lr: 1.6979e-04  eta: 0:43:26  time: 0.4926  data_time: 0.0079  memory: 11508  loss: 1.4229  grad_norm: 0.1616\n",
      "06/02 03:57:36 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1820/6543]  lr: 1.6943e-04  eta: 0:43:19  time: 0.4922  data_time: 0.0079  memory: 11508  loss: 1.4056  grad_norm: 0.1616\n",
      "06/02 03:57:41 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1830/6543]  lr: 1.6907e-04  eta: 0:43:12  time: 0.4926  data_time: 0.0079  memory: 11508  loss: 1.4546  grad_norm: 0.1648\n",
      "06/02 03:57:46 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1840/6543]  lr: 1.6871e-04  eta: 0:43:05  time: 0.4927  data_time: 0.0078  memory: 11508  loss: 1.4697  grad_norm: 0.1675\n",
      "06/02 03:57:51 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1850/6543]  lr: 1.6835e-04  eta: 0:42:58  time: 0.4918  data_time: 0.0079  memory: 11508  loss: 1.4118  grad_norm: 0.1675\n",
      "06/02 03:57:55 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1860/6543]  lr: 1.6799e-04  eta: 0:42:52  time: 0.4929  data_time: 0.0080  memory: 11508  loss: 1.4066  grad_norm: 0.1669\n",
      "06/02 03:58:00 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1870/6543]  lr: 1.6763e-04  eta: 0:42:45  time: 0.4920  data_time: 0.0079  memory: 11508  loss: 1.3591  grad_norm: 0.1669\n",
      "06/02 03:58:05 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1880/6543]  lr: 1.6726e-04  eta: 0:42:38  time: 0.4929  data_time: 0.0080  memory: 11508  loss: 1.4454  grad_norm: 0.1714\n",
      "06/02 03:58:10 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1890/6543]  lr: 1.6690e-04  eta: 0:42:31  time: 0.4925  data_time: 0.0078  memory: 11508  loss: 1.3797  grad_norm: 0.1727\n",
      "06/02 03:58:15 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1900/6543]  lr: 1.6653e-04  eta: 0:42:24  time: 0.4935  data_time: 0.0081  memory: 11508  loss: 1.4004  grad_norm: 0.1727\n",
      "06/02 03:58:20 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1910/6543]  lr: 1.6616e-04  eta: 0:42:17  time: 0.4924  data_time: 0.0078  memory: 11508  loss: 1.4234  grad_norm: 0.1767\n",
      "06/02 03:58:25 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1920/6543]  lr: 1.6578e-04  eta: 0:42:10  time: 0.4926  data_time: 0.0078  memory: 11508  loss: 1.3581  grad_norm: 0.1772\n",
      "06/02 03:58:30 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1930/6543]  lr: 1.6541e-04  eta: 0:42:04  time: 0.4922  data_time: 0.0081  memory: 11508  loss: 1.4141  grad_norm: 0.1772\n",
      "06/02 03:58:35 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1940/6543]  lr: 1.6504e-04  eta: 0:41:57  time: 0.4925  data_time: 0.0078  memory: 11508  loss: 1.4389  grad_norm: 0.1775\n",
      "06/02 03:58:40 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1950/6543]  lr: 1.6466e-04  eta: 0:41:50  time: 0.4920  data_time: 0.0080  memory: 11508  loss: 1.3671  grad_norm: 0.1775\n",
      "06/02 03:58:45 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1960/6543]  lr: 1.6428e-04  eta: 0:41:43  time: 0.4925  data_time: 0.0077  memory: 11508  loss: 1.3825  grad_norm: 0.1745\n",
      "06/02 03:58:50 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1970/6543]  lr: 1.6390e-04  eta: 0:41:37  time: 0.4925  data_time: 0.0078  memory: 11508  loss: 1.4214  grad_norm: 0.1703\n",
      "06/02 03:58:55 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1980/6543]  lr: 1.6352e-04  eta: 0:41:30  time: 0.4920  data_time: 0.0079  memory: 11508  loss: 1.4377  grad_norm: 0.1703\n",
      "06/02 03:58:59 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [1990/6543]  lr: 1.6314e-04  eta: 0:41:23  time: 0.4925  data_time: 0.0077  memory: 11508  loss: 1.4075  grad_norm: 0.1703\n",
      "06/02 03:59:04 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Exp name: gemma_2b_it_qlora_alpaca_e3_20240602_034031\n",
      "06/02 03:59:04 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2000/6543]  lr: 1.6275e-04  eta: 0:41:17  time: 0.4931  data_time: 0.0081  memory: 11508  loss: 1.4695  grad_norm: 0.1676\n",
      "06/02 03:59:04 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - after_train_iter in EvaluateChatHook.\n",
      "06/02 03:59:49 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Sample output:\n",
      "<bos><start_of_turn>system\n",
      "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n",
      "<end_of_turn>\n",
      "<start_of_turn>user\n",
      "请给我介绍五个上海的景点<end_of_turn>\n",
      "<start_of_turn>model\n",
      "1. 上海故宫\n",
      "2. 上海博物馆\n",
      "3. 上海博物馆\n",
      "4. 上海博物馆\n",
      "5. 上海博物馆 \n",
      "\n",
      "上海故宫是上海最著名的景点，是世界上保存最完好的古代建筑群，拥有世界上最古老的建筑，是上海最著名的景点。\n",
      "\n",
      "上海博物馆是上海最著名的博物馆，收藏了中国古代文物，包括陶器、漆器、陶瓷器、壁画、雕塑、家具等。\n",
      "\n",
      "上海博物馆是上海最著名的博物馆，收藏了中国古代文物，包括陶器、漆器、陶瓷器、壁画、雕塑、家具等。\n",
      "\n",
      "上海博物馆是上海最著名的博物馆，收藏了中国古代文物，包括陶器、漆器、陶瓷器、壁画、雕塑、家具等。\n",
      "\n",
      "上海博物馆是上海最著名的博物馆，收藏了中国古代文物，包括陶器、漆器、陶瓷器、壁画、雕塑、家具等。\n",
      "\n",
      "上海故宫是上海最著名的景点，是世界上保存最完好的古代建筑群，拥有世界上最古老的建筑，是上海最著名的景点。\n",
      "\n",
      "上海博物馆是上海最著名的博物馆，收藏了中国古代文物，包括陶器、漆器、陶瓷器、壁画、雕塑、家具等。\n",
      "\n",
      "上海博物馆是上海最著名的博物馆，收藏了中国古代文物，包括陶器、漆器、陶瓷器、壁画、雕塑、家具等。\n",
      "\n",
      "上海故宫是上海最著名的景点，是世界上保存最完好的古代建筑群，拥有世界上最古老的建筑，是上海最著名的景点。\n",
      "\n",
      "上海博物馆是上海最著名的博物馆，收藏了中国古代文物，包括陶器、漆器、陶瓷器、壁画、雕塑、家具等。\n",
      "\n",
      "上海博物馆是上海最著名的博物馆，收藏了中国古代文物，包括陶器、漆器、陶瓷器、壁画、雕塑、家具等。\n",
      "\n",
      "上海故宫是上海最著名的景点，是世界上保存最完好的古代建筑群，拥有世界上最古老的建筑，是上海最著名的景点。\n",
      "\n",
      "上海博物馆是上海最著名的博物馆，收藏了中国古代文物，包括陶器、漆器、陶瓷器、壁画、雕塑、家具等。\n",
      "\n",
      "上海博物馆是上海最著名的博物馆，收藏了中国古代文物，包括陶器、漆器、陶瓷器、壁画、雕塑、家具等。\n",
      "\n",
      "上海故宫是上海最著名的景点，是世界上保存最完好的古代建筑群，拥有世界上最古老的建筑，是上海最著名的景点。\n",
      "\n",
      "上海博物馆是上海最著名的博物馆，收藏了中国古代文物，包括陶器、漆器、陶瓷器、壁画、雕塑、家具等。\n",
      "\n",
      "上海故宫是上海最著名的景点，是世界上保存最完好的古代建筑群，拥有世界上最古老的建筑，是上海最著名的景点。\n",
      "\n",
      "上海\n",
      "\n",
      "06/02 03:59:51 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Sample output:\n",
      "<bos><start_of_turn>system\n",
      "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n",
      "<end_of_turn>\n",
      "<start_of_turn>user\n",
      "Please tell me five scenic spots in Shanghai<end_of_turn>\n",
      "<start_of_turn>model\n",
      "1. The Bund\n",
      "2. The Oriental Pearl Tower\n",
      "3. The Shanghai Tower\n",
      "4. The Yu Garden\n",
      "5. The Shanghai Museum<end_of_turn>\n",
      "\n",
      "06/02 03:59:51 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Saving checkpoint at 2000 iterations\n",
      "06/02 03:59:58 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2010/6543]  lr: 1.6237e-04  eta: 0:43:00  time: 5.3717  data_time: 4.8849  memory: 11508  loss: 1.4404  grad_norm: 0.1676\n",
      "06/02 04:00:03 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2020/6543]  lr: 1.6198e-04  eta: 0:42:53  time: 0.4929  data_time: 0.0080  memory: 11508  loss: 1.4596  grad_norm: 0.1665\n",
      "06/02 04:00:08 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2030/6543]  lr: 1.6159e-04  eta: 0:42:45  time: 0.4917  data_time: 0.0079  memory: 11508  loss: 1.3803  grad_norm: 0.1665\n",
      "06/02 04:00:13 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2040/6543]  lr: 1.6120e-04  eta: 0:42:38  time: 0.4925  data_time: 0.0079  memory: 11508  loss: 1.4630  grad_norm: 0.1632\n",
      "06/02 04:00:18 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2050/6543]  lr: 1.6081e-04  eta: 0:42:30  time: 0.4925  data_time: 0.0079  memory: 11508  loss: 1.4776  grad_norm: 0.1615\n",
      "06/02 04:00:23 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2060/6543]  lr: 1.6041e-04  eta: 0:42:23  time: 0.4917  data_time: 0.0078  memory: 11508  loss: 1.3700  grad_norm: 0.1615\n",
      "06/02 04:00:28 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2070/6543]  lr: 1.6002e-04  eta: 0:42:16  time: 0.4927  data_time: 0.0079  memory: 11508  loss: 1.3993  grad_norm: 0.1590\n",
      "06/02 04:00:33 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2080/6543]  lr: 1.5962e-04  eta: 0:42:09  time: 0.4925  data_time: 0.0078  memory: 11508  loss: 1.4120  grad_norm: 0.1637\n",
      "06/02 04:00:37 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2090/6543]  lr: 1.5922e-04  eta: 0:42:01  time: 0.4919  data_time: 0.0079  memory: 11508  loss: 1.3976  grad_norm: 0.1637\n",
      "06/02 04:00:42 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2100/6543]  lr: 1.5882e-04  eta: 0:41:54  time: 0.4925  data_time: 0.0077  memory: 11508  loss: 1.4082  grad_norm: 0.1639\n",
      "06/02 04:00:47 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2110/6543]  lr: 1.5842e-04  eta: 0:41:47  time: 0.4920  data_time: 0.0079  memory: 11508  loss: 1.4301  grad_norm: 0.1639\n",
      "06/02 04:00:52 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2120/6543]  lr: 1.5802e-04  eta: 0:41:40  time: 0.4924  data_time: 0.0078  memory: 11508  loss: 1.4263  grad_norm: 0.1646\n",
      "06/02 04:00:57 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2130/6543]  lr: 1.5762e-04  eta: 0:41:33  time: 0.4930  data_time: 0.0080  memory: 11508  loss: 1.3449  grad_norm: 0.1686\n",
      "06/02 04:01:02 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2140/6543]  lr: 1.5721e-04  eta: 0:41:25  time: 0.4925  data_time: 0.0082  memory: 11508  loss: 1.4205  grad_norm: 0.1686\n",
      "06/02 04:01:07 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2150/6543]  lr: 1.5680e-04  eta: 0:41:18  time: 0.4927  data_time: 0.0079  memory: 11508  loss: 1.4011  grad_norm: 0.1688\n",
      "06/02 04:01:12 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2160/6543]  lr: 1.5640e-04  eta: 0:41:11  time: 0.4929  data_time: 0.0080  memory: 11508  loss: 1.4274  grad_norm: 0.1761\n",
      "06/02 04:01:17 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2170/6543]  lr: 1.5599e-04  eta: 0:41:04  time: 0.4919  data_time: 0.0079  memory: 11508  loss: 1.4245  grad_norm: 0.1761\n",
      "06/02 04:01:22 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2180/6543]  lr: 1.5558e-04  eta: 0:40:57  time: 0.4926  data_time: 0.0079  memory: 11508  loss: 1.4346  grad_norm: 0.1736\n",
      "06/02 04:01:22 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Exp name: gemma_2b_it_qlora_alpaca_e3_20240602_034031\n",
      "06/02 04:01:22 - mmengine - \u001b[5m\u001b[4m\u001b[33mWARNING\u001b[0m - Reach the end of the dataloader, it will be restarted and continue to iterate. It is recommended to use `mmengine.dataset.InfiniteSampler` to enable the dataloader to iterate infinitely.\n",
      "06/02 04:01:29 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2190/6543]  lr: 1.5516e-04  eta: 0:40:54  time: 0.6950  data_time: 0.2084  memory: 11508  loss: 1.4103  grad_norm: 0.1736\n",
      "06/02 04:01:34 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2200/6543]  lr: 1.5475e-04  eta: 0:40:47  time: 0.4931  data_time: 0.0081  memory: 11508  loss: 1.3430  grad_norm: 0.1763\n",
      "06/02 04:01:39 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2210/6543]  lr: 1.5433e-04  eta: 0:40:40  time: 0.4926  data_time: 0.0080  memory: 11508  loss: 1.4228  grad_norm: 0.1806\n",
      "06/02 04:01:44 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2220/6543]  lr: 1.5392e-04  eta: 0:40:33  time: 0.4918  data_time: 0.0078  memory: 11508  loss: 1.4121  grad_norm: 0.1806\n",
      "06/02 04:01:48 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2230/6543]  lr: 1.5350e-04  eta: 0:40:26  time: 0.4928  data_time: 0.0080  memory: 11508  loss: 1.4248  grad_norm: 0.1769\n",
      "06/02 04:01:53 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2240/6543]  lr: 1.5308e-04  eta: 0:40:19  time: 0.4928  data_time: 0.0080  memory: 11508  loss: 1.3276  grad_norm: 0.1755\n",
      "06/02 04:01:58 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2250/6543]  lr: 1.5266e-04  eta: 0:40:12  time: 0.4924  data_time: 0.0081  memory: 11508  loss: 1.3616  grad_norm: 0.1755\n",
      "06/02 04:02:03 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2260/6543]  lr: 1.5224e-04  eta: 0:40:05  time: 0.4933  data_time: 0.0081  memory: 11508  loss: 1.3354  grad_norm: 0.1745\n",
      "06/02 04:02:08 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2270/6543]  lr: 1.5182e-04  eta: 0:39:58  time: 0.4919  data_time: 0.0079  memory: 11508  loss: 1.2996  grad_norm: 0.1745\n",
      "06/02 04:02:13 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2280/6543]  lr: 1.5139e-04  eta: 0:39:51  time: 0.4931  data_time: 0.0081  memory: 11508  loss: 1.3225  grad_norm: 0.1741\n",
      "06/02 04:02:18 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2290/6543]  lr: 1.5097e-04  eta: 0:39:44  time: 0.4928  data_time: 0.0081  memory: 11508  loss: 1.3299  grad_norm: 0.1705\n",
      "06/02 04:02:23 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2300/6543]  lr: 1.5054e-04  eta: 0:39:38  time: 0.4920  data_time: 0.0080  memory: 11508  loss: 1.4036  grad_norm: 0.1705\n",
      "06/02 04:02:28 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2310/6543]  lr: 1.5011e-04  eta: 0:39:31  time: 0.4927  data_time: 0.0080  memory: 11508  loss: 1.3353  grad_norm: 0.1683\n",
      "06/02 04:02:33 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2320/6543]  lr: 1.4969e-04  eta: 0:39:24  time: 0.4932  data_time: 0.0081  memory: 11508  loss: 1.3871  grad_norm: 0.1609\n",
      "06/02 04:02:38 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2330/6543]  lr: 1.4926e-04  eta: 0:39:17  time: 0.4922  data_time: 0.0081  memory: 11508  loss: 1.3734  grad_norm: 0.1609\n",
      "06/02 04:02:43 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2340/6543]  lr: 1.4882e-04  eta: 0:39:10  time: 0.4924  data_time: 0.0078  memory: 11508  loss: 1.4159  grad_norm: 0.1614\n",
      "06/02 04:02:48 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2350/6543]  lr: 1.4839e-04  eta: 0:39:03  time: 0.4923  data_time: 0.0080  memory: 11508  loss: 1.3337  grad_norm: 0.1614\n",
      "06/02 04:02:53 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2360/6543]  lr: 1.4796e-04  eta: 0:38:57  time: 0.4929  data_time: 0.0078  memory: 11508  loss: 1.3291  grad_norm: 0.1596\n",
      "06/02 04:02:57 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2370/6543]  lr: 1.4752e-04  eta: 0:38:50  time: 0.4927  data_time: 0.0080  memory: 11508  loss: 1.3258  grad_norm: 0.1619\n",
      "06/02 04:03:02 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2380/6543]  lr: 1.4709e-04  eta: 0:38:43  time: 0.4922  data_time: 0.0081  memory: 11508  loss: 1.3055  grad_norm: 0.1619\n",
      "06/02 04:03:07 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2390/6543]  lr: 1.4665e-04  eta: 0:38:36  time: 0.4925  data_time: 0.0079  memory: 11508  loss: 1.3674  grad_norm: 0.1627\n",
      "06/02 04:03:12 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2400/6543]  lr: 1.4621e-04  eta: 0:38:30  time: 0.4928  data_time: 0.0080  memory: 11508  loss: 1.3312  grad_norm: 0.1608\n",
      "06/02 04:03:17 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2410/6543]  lr: 1.4577e-04  eta: 0:38:23  time: 0.4922  data_time: 0.0081  memory: 11508  loss: 1.3551  grad_norm: 0.1608\n",
      "06/02 04:03:22 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2420/6543]  lr: 1.4533e-04  eta: 0:38:16  time: 0.4928  data_time: 0.0080  memory: 11508  loss: 1.3554  grad_norm: 0.1624\n",
      "06/02 04:03:27 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2430/6543]  lr: 1.4489e-04  eta: 0:38:10  time: 0.4920  data_time: 0.0079  memory: 11508  loss: 1.3816  grad_norm: 0.1624\n",
      "06/02 04:03:32 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2440/6543]  lr: 1.4445e-04  eta: 0:38:03  time: 0.4936  data_time: 0.0081  memory: 11508  loss: 1.3351  grad_norm: 0.1639\n",
      "06/02 04:03:37 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2450/6543]  lr: 1.4400e-04  eta: 0:37:56  time: 0.4929  data_time: 0.0080  memory: 11508  loss: 1.3798  grad_norm: 0.1662\n",
      "06/02 04:03:42 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2460/6543]  lr: 1.4356e-04  eta: 0:37:50  time: 0.4922  data_time: 0.0080  memory: 11508  loss: 1.2750  grad_norm: 0.1662\n",
      "06/02 04:03:47 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2470/6543]  lr: 1.4311e-04  eta: 0:37:43  time: 0.4927  data_time: 0.0080  memory: 11508  loss: 1.2926  grad_norm: 0.1658\n",
      "06/02 04:03:52 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2480/6543]  lr: 1.4266e-04  eta: 0:37:37  time: 0.4927  data_time: 0.0080  memory: 11508  loss: 1.3371  grad_norm: 0.1672\n",
      "06/02 04:03:57 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2490/6543]  lr: 1.4222e-04  eta: 0:37:30  time: 0.4935  data_time: 0.0090  memory: 11508  loss: 1.3823  grad_norm: 0.1672\n",
      "06/02 04:04:01 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2500/6543]  lr: 1.4177e-04  eta: 0:37:24  time: 0.4925  data_time: 0.0079  memory: 11508  loss: 1.3937  grad_norm: 0.1695\n",
      "06/02 04:04:01 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - after_train_iter in EvaluateChatHook.\n",
      "06/02 04:04:20 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Sample output:\n",
      "<bos><start_of_turn>system\n",
      "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n",
      "<end_of_turn>\n",
      "<start_of_turn>user\n",
      "请给我介绍五个上海的景点<end_of_turn>\n",
      "<start_of_turn>model\n",
      "1. 上海故宫\n",
      "2. 上海博物馆\n",
      "3. 上海虹桥\n",
      "4. 上海天坛\n",
      "5. 上海中山公园 \n",
      "\n",
      "上海故宫是上海最著名的景点之一，它拥有了世界上最古老的建筑，是上海文明史上的重要文化遗产。上海博物馆是上海重要的文化教育机构，它拥有了世界上最古老的博物馆，是上海文明史上的重要文化遗产。上海虹桥是上海重要的交通枢纽，它拥有了世界上最古老的桥梁，是上海文明史上的重要文化遗产。上海天坛是上海重要的宗教文化中心，它拥有了世界上最古老的寺庙，是上海文明史上的重要文化遗产。上海中山公园是上海重要的文化娱乐场所，它拥有了世界上最古老的公园，是上海文明史上的重要文化遗产。\n",
      "\n",
      "上海的景点是上海文明史上的重要文化遗产，是上海文明史上的重要文化遗产。\n",
      " \n",
      "1. 上海故宫 \n",
      "2. 上海博物馆 \n",
      "3. 上海虹桥 \n",
      "4. 上海天坛 \n",
      "5. 上海中山公园 \n",
      "\n",
      "上海的景点是上海文明史上的重要文化遗产，是上海文明史上的重要文化遗产。<end_of_turn>\n",
      "\n",
      "06/02 04:04:22 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Sample output:\n",
      "<bos><start_of_turn>system\n",
      "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n",
      "<end_of_turn>\n",
      "<start_of_turn>user\n",
      "Please tell me five scenic spots in Shanghai<end_of_turn>\n",
      "<start_of_turn>model\n",
      "1. The Bund\n",
      "2. The Oriental Pearl Tower\n",
      "3. The Shanghai Tower\n",
      "4. The Yu Garden\n",
      "5. The Shanghai Museum<end_of_turn>\n",
      "\n",
      "06/02 04:04:22 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Saving checkpoint at 2500 iterations\n",
      "06/02 04:04:29 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2510/6543]  lr: 1.4132e-04  eta: 0:37:54  time: 2.8005  data_time: 2.3143  memory: 11508  loss: 1.2985  grad_norm: 0.1695\n",
      "06/02 04:04:34 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2520/6543]  lr: 1.4086e-04  eta: 0:37:47  time: 0.4940  data_time: 0.0079  memory: 11508  loss: 1.3612  grad_norm: 0.1715\n",
      "06/02 04:04:39 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2530/6543]  lr: 1.4041e-04  eta: 0:37:41  time: 0.4925  data_time: 0.0078  memory: 11508  loss: 1.4245  grad_norm: 0.1672\n",
      "06/02 04:04:44 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2540/6543]  lr: 1.3996e-04  eta: 0:37:34  time: 0.4918  data_time: 0.0079  memory: 11508  loss: 1.3337  grad_norm: 0.1672\n",
      "06/02 04:04:49 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2550/6543]  lr: 1.3951e-04  eta: 0:37:27  time: 0.4923  data_time: 0.0078  memory: 11508  loss: 1.3829  grad_norm: 0.1707\n",
      "06/02 04:04:54 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2560/6543]  lr: 1.3905e-04  eta: 0:37:20  time: 0.4925  data_time: 0.0078  memory: 11508  loss: 1.3386  grad_norm: 0.1714\n",
      "06/02 04:04:59 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2570/6543]  lr: 1.3859e-04  eta: 0:37:14  time: 0.4919  data_time: 0.0080  memory: 11508  loss: 1.3685  grad_norm: 0.1714\n",
      "06/02 04:05:04 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2580/6543]  lr: 1.3814e-04  eta: 0:37:07  time: 0.4928  data_time: 0.0080  memory: 11508  loss: 1.4593  grad_norm: 0.1711\n",
      "06/02 04:05:09 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2590/6543]  lr: 1.3768e-04  eta: 0:37:00  time: 0.4919  data_time: 0.0079  memory: 11508  loss: 1.3394  grad_norm: 0.1711\n",
      "06/02 04:05:14 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2600/6543]  lr: 1.3722e-04  eta: 0:36:54  time: 0.4925  data_time: 0.0078  memory: 11508  loss: 1.3364  grad_norm: 0.1707\n",
      "06/02 04:05:19 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2610/6543]  lr: 1.3676e-04  eta: 0:36:47  time: 0.4927  data_time: 0.0079  memory: 11508  loss: 1.3387  grad_norm: 0.1705\n",
      "06/02 04:05:24 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2620/6543]  lr: 1.3630e-04  eta: 0:36:40  time: 0.4924  data_time: 0.0084  memory: 11508  loss: 1.4058  grad_norm: 0.1705\n",
      "06/02 04:05:29 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2630/6543]  lr: 1.3584e-04  eta: 0:36:34  time: 0.4924  data_time: 0.0078  memory: 11508  loss: 1.2916  grad_norm: 0.1726\n",
      "06/02 04:05:34 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2640/6543]  lr: 1.3538e-04  eta: 0:36:27  time: 0.4925  data_time: 0.0078  memory: 11508  loss: 1.3240  grad_norm: 0.1717\n",
      "06/02 04:05:38 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2650/6543]  lr: 1.3491e-04  eta: 0:36:20  time: 0.4921  data_time: 0.0079  memory: 11508  loss: 1.2980  grad_norm: 0.1717\n",
      "06/02 04:05:43 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2660/6543]  lr: 1.3445e-04  eta: 0:36:14  time: 0.4929  data_time: 0.0080  memory: 11508  loss: 1.3914  grad_norm: 0.1711\n",
      "06/02 04:05:48 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2670/6543]  lr: 1.3398e-04  eta: 0:36:07  time: 0.4918  data_time: 0.0078  memory: 11508  loss: 1.3801  grad_norm: 0.1711\n",
      "06/02 04:05:53 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2680/6543]  lr: 1.3352e-04  eta: 0:36:01  time: 0.4925  data_time: 0.0078  memory: 11508  loss: 1.3659  grad_norm: 0.1725\n",
      "06/02 04:05:58 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2690/6543]  lr: 1.3305e-04  eta: 0:35:54  time: 0.4925  data_time: 0.0078  memory: 11508  loss: 1.3176  grad_norm: 0.1731\n",
      "06/02 04:06:03 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2700/6543]  lr: 1.3258e-04  eta: 0:35:48  time: 0.4921  data_time: 0.0080  memory: 11508  loss: 1.4524  grad_norm: 0.1731\n",
      "06/02 04:06:08 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2710/6543]  lr: 1.3211e-04  eta: 0:35:41  time: 0.4928  data_time: 0.0080  memory: 11508  loss: 1.3255  grad_norm: 0.1712\n",
      "06/02 04:06:13 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2720/6543]  lr: 1.3164e-04  eta: 0:35:34  time: 0.4926  data_time: 0.0078  memory: 11508  loss: 1.3588  grad_norm: 0.1725\n",
      "06/02 04:06:18 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2730/6543]  lr: 1.3117e-04  eta: 0:35:28  time: 0.4920  data_time: 0.0079  memory: 11508  loss: 1.2092  grad_norm: 0.1725\n",
      "06/02 04:06:23 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2740/6543]  lr: 1.3070e-04  eta: 0:35:21  time: 0.4924  data_time: 0.0078  memory: 11508  loss: 1.3592  grad_norm: 0.1762\n",
      "06/02 04:06:28 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2750/6543]  lr: 1.3023e-04  eta: 0:35:15  time: 0.4921  data_time: 0.0079  memory: 11508  loss: 1.4328  grad_norm: 0.1762\n",
      "06/02 04:06:33 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2760/6543]  lr: 1.2976e-04  eta: 0:35:09  time: 0.4924  data_time: 0.0078  memory: 11508  loss: 1.3384  grad_norm: 0.1763\n",
      "06/02 04:06:38 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2770/6543]  lr: 1.2929e-04  eta: 0:35:02  time: 0.4923  data_time: 0.0078  memory: 11508  loss: 1.3372  grad_norm: 0.1755\n",
      "06/02 04:06:42 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2780/6543]  lr: 1.2881e-04  eta: 0:34:56  time: 0.4921  data_time: 0.0079  memory: 11508  loss: 1.4127  grad_norm: 0.1755\n",
      "06/02 04:06:47 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2790/6543]  lr: 1.2834e-04  eta: 0:34:49  time: 0.4926  data_time: 0.0079  memory: 11508  loss: 1.3195  grad_norm: 0.1747\n",
      "06/02 04:06:52 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2800/6543]  lr: 1.2786e-04  eta: 0:34:43  time: 0.4930  data_time: 0.0080  memory: 11508  loss: 1.3814  grad_norm: 0.1761\n",
      "06/02 04:06:57 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2810/6543]  lr: 1.2739e-04  eta: 0:34:36  time: 0.4919  data_time: 0.0080  memory: 11508  loss: 1.3363  grad_norm: 0.1761\n",
      "06/02 04:07:02 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2820/6543]  lr: 1.2691e-04  eta: 0:34:30  time: 0.4924  data_time: 0.0078  memory: 11508  loss: 1.3448  grad_norm: 0.1757\n",
      "06/02 04:07:07 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2830/6543]  lr: 1.2644e-04  eta: 0:34:24  time: 0.4925  data_time: 0.0079  memory: 11508  loss: 1.2978  grad_norm: 0.1757\n",
      "06/02 04:07:12 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2840/6543]  lr: 1.2596e-04  eta: 0:34:17  time: 0.4924  data_time: 0.0078  memory: 11508  loss: 1.3761  grad_norm: 0.1723\n",
      "06/02 04:07:17 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2850/6543]  lr: 1.2548e-04  eta: 0:34:11  time: 0.4928  data_time: 0.0080  memory: 11508  loss: 1.3413  grad_norm: 0.1706\n",
      "06/02 04:07:22 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2860/6543]  lr: 1.2500e-04  eta: 0:34:04  time: 0.4921  data_time: 0.0079  memory: 11508  loss: 1.4397  grad_norm: 0.1706\n",
      "06/02 04:07:27 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2870/6543]  lr: 1.2452e-04  eta: 0:33:58  time: 0.4929  data_time: 0.0079  memory: 11508  loss: 1.4184  grad_norm: 0.1723\n",
      "06/02 04:07:32 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2880/6543]  lr: 1.2404e-04  eta: 0:33:52  time: 0.4932  data_time: 0.0078  memory: 11508  loss: 1.4025  grad_norm: 0.1734\n",
      "06/02 04:07:37 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2890/6543]  lr: 1.2356e-04  eta: 0:33:45  time: 0.4917  data_time: 0.0078  memory: 11508  loss: 1.3367  grad_norm: 0.1734\n",
      "06/02 04:07:42 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2900/6543]  lr: 1.2308e-04  eta: 0:33:39  time: 0.4927  data_time: 0.0079  memory: 11508  loss: 1.3139  grad_norm: 0.1721\n",
      "06/02 04:07:46 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2910/6543]  lr: 1.2260e-04  eta: 0:33:33  time: 0.4917  data_time: 0.0078  memory: 11508  loss: 1.3598  grad_norm: 0.1721\n",
      "06/02 04:07:51 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2920/6543]  lr: 1.2211e-04  eta: 0:33:26  time: 0.4927  data_time: 0.0080  memory: 11508  loss: 1.4285  grad_norm: 0.1730\n",
      "06/02 04:07:56 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2930/6543]  lr: 1.2163e-04  eta: 0:33:20  time: 0.4927  data_time: 0.0079  memory: 11508  loss: 1.3749  grad_norm: 0.1742\n",
      "06/02 04:08:01 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2940/6543]  lr: 1.2115e-04  eta: 0:33:14  time: 0.4920  data_time: 0.0080  memory: 11508  loss: 1.4174  grad_norm: 0.1742\n",
      "06/02 04:08:06 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2950/6543]  lr: 1.2066e-04  eta: 0:33:08  time: 0.4929  data_time: 0.0080  memory: 11508  loss: 1.2953  grad_norm: 0.1736\n",
      "06/02 04:08:11 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2960/6543]  lr: 1.2018e-04  eta: 0:33:01  time: 0.4927  data_time: 0.0078  memory: 11508  loss: 1.4379  grad_norm: 0.1728\n",
      "06/02 04:08:16 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2970/6543]  lr: 1.1969e-04  eta: 0:32:55  time: 0.4919  data_time: 0.0079  memory: 11508  loss: 1.3286  grad_norm: 0.1728\n",
      "06/02 04:08:21 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2980/6543]  lr: 1.1921e-04  eta: 0:32:49  time: 0.4923  data_time: 0.0078  memory: 11508  loss: 1.3670  grad_norm: 0.1718\n",
      "06/02 04:08:26 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [2990/6543]  lr: 1.1872e-04  eta: 0:32:43  time: 0.4921  data_time: 0.0080  memory: 11508  loss: 1.3526  grad_norm: 0.1718\n",
      "06/02 04:08:31 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Exp name: gemma_2b_it_qlora_alpaca_e3_20240602_034031\n",
      "06/02 04:08:31 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3000/6543]  lr: 1.1824e-04  eta: 0:32:36  time: 0.4925  data_time: 0.0078  memory: 11508  loss: 1.3925  grad_norm: 0.1724\n",
      "06/02 04:08:31 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - after_train_iter in EvaluateChatHook.\n",
      "06/02 04:09:15 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Sample output:\n",
      "<bos><start_of_turn>system\n",
      "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n",
      "<end_of_turn>\n",
      "<start_of_turn>user\n",
      "请给我介绍五个上海的景点<end_of_turn>\n",
      "<start_of_turn>model\n",
      "1. 上海故宫\n",
      "2. 上海博物馆\n",
      "3. 上海虹桥\n",
      "4. 上海天坛\n",
      "5. 上海中山公园 \n",
      "\n",
      "上海拥有众多历史悠久的景点，如上海故宫、上海博物馆、上海虹桥、上海天坛和上海中山公园，这些景点都拥有独特的历史文化，是上海不可或缺的景点。 \n",
      "\n",
      "上海故宫是上海最古老的景点，是世界上保存最完好的古代建筑，是上海重要的历史文化遗产。上海博物馆是上海重要的科学文化教育机构，收藏了世界上最完好的古代文物。上海虹桥是上海重要的交通枢纽，是上海重要的文化娱乐活动中心。上海天坛是上海重要的宗教文化中心，是上海重要的宗教文化活动中心。上海中山公园是上海重要的绿化娱乐休闲地，是上海重要的文化娱乐活动中心。\n",
      "\n",
      "上海拥有众多历史悠久的景点，是上海不可或缺的景点，是上海重要的文化娱乐活动中心。 \n",
      "\n",
      "希望我的回答能帮助您！ \n",
      " \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "  \n",
      "\n",
      "06/02 04:09:18 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Sample output:\n",
      "<bos><start_of_turn>system\n",
      "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n",
      "<end_of_turn>\n",
      "<start_of_turn>user\n",
      "Please tell me five scenic spots in Shanghai<end_of_turn>\n",
      "<start_of_turn>model\n",
      "Five scenic spots in Shanghai are: \n",
      "1. The Bund \n",
      "2. The Oriental Pearl Tower \n",
      "3. The Shanghai Tower \n",
      "4. The Yu Garden \n",
      "5. The Jade Buddha Temple<end_of_turn>\n",
      "\n",
      "06/02 04:09:18 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Saving checkpoint at 3000 iterations\n",
      "06/02 04:09:26 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3010/6543]  lr: 1.1775e-04  eta: 0:33:29  time: 5.4832  data_time: 4.9954  memory: 11508  loss: 1.3339  grad_norm: 0.1739\n",
      "06/02 04:09:31 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3020/6543]  lr: 1.1726e-04  eta: 0:33:22  time: 0.4916  data_time: 0.0079  memory: 11508  loss: 1.3802  grad_norm: 0.1739\n",
      "06/02 04:09:35 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3030/6543]  lr: 1.1677e-04  eta: 0:33:15  time: 0.4924  data_time: 0.0079  memory: 11508  loss: 1.3003  grad_norm: 0.1717\n",
      "06/02 04:09:40 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3040/6543]  lr: 1.1629e-04  eta: 0:33:09  time: 0.4925  data_time: 0.0079  memory: 11508  loss: 1.3797  grad_norm: 0.1717\n",
      "06/02 04:09:45 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3050/6543]  lr: 1.1580e-04  eta: 0:33:02  time: 0.4914  data_time: 0.0077  memory: 11508  loss: 1.3252  grad_norm: 0.1717\n",
      "06/02 04:09:50 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3060/6543]  lr: 1.1531e-04  eta: 0:32:56  time: 0.4926  data_time: 0.0080  memory: 11508  loss: 1.3242  grad_norm: 0.1703\n",
      "06/02 04:09:55 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3070/6543]  lr: 1.1482e-04  eta: 0:32:49  time: 0.4916  data_time: 0.0078  memory: 11508  loss: 1.3430  grad_norm: 0.1703\n",
      "06/02 04:10:00 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3080/6543]  lr: 1.1433e-04  eta: 0:32:43  time: 0.4926  data_time: 0.0080  memory: 11508  loss: 1.3453  grad_norm: 0.1740\n",
      "06/02 04:10:05 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3090/6543]  lr: 1.1384e-04  eta: 0:32:36  time: 0.4927  data_time: 0.0080  memory: 11508  loss: 1.3743  grad_norm: 0.1728\n",
      "06/02 04:10:10 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3100/6543]  lr: 1.1335e-04  eta: 0:32:30  time: 0.4919  data_time: 0.0079  memory: 11508  loss: 1.3548  grad_norm: 0.1728\n",
      "06/02 04:10:15 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3110/6543]  lr: 1.1286e-04  eta: 0:32:23  time: 0.4928  data_time: 0.0079  memory: 11508  loss: 1.3656  grad_norm: 0.1742\n",
      "06/02 04:10:20 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3120/6543]  lr: 1.1237e-04  eta: 0:32:17  time: 0.4929  data_time: 0.0080  memory: 11508  loss: 1.3154  grad_norm: 0.1738\n",
      "06/02 04:10:25 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3130/6543]  lr: 1.1188e-04  eta: 0:32:10  time: 0.4926  data_time: 0.0085  memory: 11508  loss: 1.3811  grad_norm: 0.1738\n",
      "06/02 04:10:30 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3140/6543]  lr: 1.1138e-04  eta: 0:32:04  time: 0.4925  data_time: 0.0078  memory: 11508  loss: 1.3373  grad_norm: 0.1756\n",
      "06/02 04:10:35 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3150/6543]  lr: 1.1089e-04  eta: 0:31:58  time: 0.4920  data_time: 0.0079  memory: 11508  loss: 1.3638  grad_norm: 0.1756\n",
      "06/02 04:10:39 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3160/6543]  lr: 1.1040e-04  eta: 0:31:51  time: 0.4924  data_time: 0.0078  memory: 11508  loss: 1.3672  grad_norm: 0.1858\n",
      "06/02 04:10:44 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3170/6543]  lr: 1.0991e-04  eta: 0:31:45  time: 0.4924  data_time: 0.0078  memory: 11508  loss: 1.3493  grad_norm: 0.1855\n",
      "06/02 04:10:49 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3180/6543]  lr: 1.0942e-04  eta: 0:31:38  time: 0.4922  data_time: 0.0080  memory: 11508  loss: 1.3056  grad_norm: 0.1855\n",
      "06/02 04:10:54 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3190/6543]  lr: 1.0892e-04  eta: 0:31:32  time: 0.4924  data_time: 0.0078  memory: 11508  loss: 1.3848  grad_norm: 0.1936\n",
      "06/02 04:10:59 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3200/6543]  lr: 1.0843e-04  eta: 0:31:25  time: 0.4927  data_time: 0.0080  memory: 11508  loss: 1.2942  grad_norm: 0.1949\n",
      "06/02 04:11:04 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3210/6543]  lr: 1.0794e-04  eta: 0:31:19  time: 0.4920  data_time: 0.0080  memory: 11508  loss: 1.3965  grad_norm: 0.1949\n",
      "06/02 04:11:09 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3220/6543]  lr: 1.0744e-04  eta: 0:31:13  time: 0.4927  data_time: 0.0079  memory: 11508  loss: 1.4331  grad_norm: 0.2036\n",
      "06/02 04:11:14 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3230/6543]  lr: 1.0695e-04  eta: 0:31:06  time: 0.4919  data_time: 0.0078  memory: 11508  loss: 1.3007  grad_norm: 0.2036\n",
      "06/02 04:11:19 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3240/6543]  lr: 1.0645e-04  eta: 0:31:00  time: 0.4926  data_time: 0.0079  memory: 11508  loss: 1.4063  grad_norm: 0.2016\n",
      "06/02 04:11:24 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3250/6543]  lr: 1.0596e-04  eta: 0:30:54  time: 0.4927  data_time: 0.0079  memory: 11508  loss: 1.3512  grad_norm: 0.2048\n",
      "06/02 04:11:29 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3260/6543]  lr: 1.0547e-04  eta: 0:30:47  time: 0.4919  data_time: 0.0079  memory: 11508  loss: 1.3737  grad_norm: 0.2048\n",
      "06/02 04:11:34 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3270/6543]  lr: 1.0497e-04  eta: 0:30:41  time: 0.4927  data_time: 0.0079  memory: 11508  loss: 1.4158  grad_norm: 0.2041\n",
      "06/02 04:11:39 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3280/6543]  lr: 1.0448e-04  eta: 0:30:35  time: 0.4926  data_time: 0.0078  memory: 11508  loss: 1.3181  grad_norm: 0.2071\n",
      "06/02 04:11:43 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3290/6543]  lr: 1.0398e-04  eta: 0:30:28  time: 0.4924  data_time: 0.0078  memory: 11508  loss: 1.3208  grad_norm: 0.2071\n",
      "06/02 04:11:48 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3300/6543]  lr: 1.0349e-04  eta: 0:30:22  time: 0.4927  data_time: 0.0079  memory: 11508  loss: 1.3133  grad_norm: 0.2100\n",
      "06/02 04:11:53 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3310/6543]  lr: 1.0299e-04  eta: 0:30:16  time: 0.4921  data_time: 0.0079  memory: 11508  loss: 1.4277  grad_norm: 0.2100\n",
      "06/02 04:11:58 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3320/6543]  lr: 1.0250e-04  eta: 0:30:09  time: 0.4931  data_time: 0.0081  memory: 11508  loss: 1.3513  grad_norm: 0.2085\n",
      "06/02 04:12:03 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3330/6543]  lr: 1.0200e-04  eta: 0:30:03  time: 0.4931  data_time: 0.0081  memory: 11508  loss: 1.3429  grad_norm: 0.2069\n",
      "06/02 04:12:08 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3340/6543]  lr: 1.0151e-04  eta: 0:29:57  time: 0.4925  data_time: 0.0083  memory: 11508  loss: 1.3706  grad_norm: 0.2069\n",
      "06/02 04:12:13 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3350/6543]  lr: 1.0101e-04  eta: 0:29:51  time: 0.4925  data_time: 0.0078  memory: 11508  loss: 1.3687  grad_norm: 0.2012\n",
      "06/02 04:12:18 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3360/6543]  lr: 1.0052e-04  eta: 0:29:44  time: 0.4926  data_time: 0.0078  memory: 11508  loss: 1.4320  grad_norm: 0.2069\n",
      "06/02 04:12:23 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3370/6543]  lr: 1.0002e-04  eta: 0:29:38  time: 0.4928  data_time: 0.0088  memory: 11508  loss: 1.2986  grad_norm: 0.2069\n",
      "06/02 04:12:28 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3380/6543]  lr: 9.9530e-05  eta: 0:29:32  time: 0.4926  data_time: 0.0079  memory: 11508  loss: 1.2843  grad_norm: 0.1990\n",
      "06/02 04:12:33 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3390/6543]  lr: 9.9035e-05  eta: 0:29:26  time: 0.4919  data_time: 0.0079  memory: 11508  loss: 1.3140  grad_norm: 0.1990\n",
      "06/02 04:12:38 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3400/6543]  lr: 9.8540e-05  eta: 0:29:19  time: 0.4924  data_time: 0.0078  memory: 11508  loss: 1.3979  grad_norm: 0.2056\n",
      "06/02 04:12:43 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3410/6543]  lr: 9.8045e-05  eta: 0:29:13  time: 0.4926  data_time: 0.0078  memory: 11508  loss: 1.4103  grad_norm: 0.2029\n",
      "06/02 04:12:48 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3420/6543]  lr: 9.7550e-05  eta: 0:29:07  time: 0.4927  data_time: 0.0084  memory: 11508  loss: 1.3641  grad_norm: 0.2029\n",
      "06/02 04:12:52 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3430/6543]  lr: 9.7055e-05  eta: 0:29:01  time: 0.4924  data_time: 0.0078  memory: 11508  loss: 1.3686  grad_norm: 0.2073\n",
      "06/02 04:12:57 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3440/6543]  lr: 9.6561e-05  eta: 0:28:55  time: 0.4930  data_time: 0.0080  memory: 11508  loss: 1.3445  grad_norm: 0.2075\n",
      "06/02 04:13:02 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3450/6543]  lr: 9.6066e-05  eta: 0:28:48  time: 0.4921  data_time: 0.0081  memory: 11508  loss: 1.2983  grad_norm: 0.2075\n",
      "06/02 04:13:07 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3460/6543]  lr: 9.5571e-05  eta: 0:28:42  time: 0.4927  data_time: 0.0080  memory: 11508  loss: 1.3036  grad_norm: 0.2052\n",
      "06/02 04:13:12 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3470/6543]  lr: 9.5077e-05  eta: 0:28:36  time: 0.4918  data_time: 0.0077  memory: 11508  loss: 1.3261  grad_norm: 0.2052\n",
      "06/02 04:13:17 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3480/6543]  lr: 9.4583e-05  eta: 0:28:30  time: 0.4925  data_time: 0.0078  memory: 11508  loss: 1.3395  grad_norm: 0.2002\n",
      "06/02 04:13:22 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3490/6543]  lr: 9.4089e-05  eta: 0:28:24  time: 0.4927  data_time: 0.0080  memory: 11508  loss: 1.3388  grad_norm: 0.2094\n",
      "06/02 04:13:27 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3500/6543]  lr: 9.3594e-05  eta: 0:28:17  time: 0.4920  data_time: 0.0079  memory: 11508  loss: 1.3546  grad_norm: 0.2094\n",
      "06/02 04:13:27 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - after_train_iter in EvaluateChatHook.\n",
      "06/02 04:14:11 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Sample output:\n",
      "<bos><start_of_turn>system\n",
      "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n",
      "<end_of_turn>\n",
      "<start_of_turn>user\n",
      "请给我介绍五个上海的景点<end_of_turn>\n",
      "<start_of_turn>model\n",
      "1. 上海故宫\n",
      "2. 上海博物馆\n",
      "3. 上海虹桥\n",
      "4. 上海天坛\n",
      "5. 上海南山公园\n",
      "6. 上海市博物馆\n",
      "7. 上海科学技术馆\n",
      "8. 上海海洋博物馆\n",
      "9. 上海博物馆\n",
      "10. 上海自然博物馆\n",
      "11. 上海故宫\n",
      "12. 上海博物馆\n",
      "13. 上海虹桥\n",
      "14. 上海天坛\n",
      "15. 上海南山公园\n",
      "16. 上海科学技术馆\n",
      "17. 上海博物馆\n",
      "18. 上海自然博物馆\n",
      "19. 上海海洋博物馆\n",
      "20. 上海博物馆\n",
      "21. 上海自然博物馆\n",
      "22. 上海博物馆\n",
      "23. 上海自然博物馆\n",
      "24. 上海博物馆\n",
      "25. 上海博物馆\n",
      "26. 上海博物馆\n",
      "27. 上海博物馆\n",
      "28. 上海博物馆\n",
      "29. 上海博物馆\n",
      "30. 上海博物馆\n",
      "31. 上海博物馆\n",
      "32. 上海博物馆\n",
      "33. 上海博物馆\n",
      "34. 上海博物馆\n",
      "35. 上海博物馆\n",
      "36. 上海博物馆\n",
      "37. 上海博物馆\n",
      "38. 上海博物馆\n",
      "39. 上海博物馆\n",
      "40. 上海博物馆\n",
      "41. 上海博物馆\n",
      "42. 上海博物馆\n",
      "43. 上海博物馆\n",
      "44. 上海博物馆\n",
      "45. 上海博物馆\n",
      "46. 上海博物馆\n",
      "47. 上海博物馆\n",
      "48. 上海博物馆\n",
      "49. 上海博物馆\n",
      "50. 上海博物馆\n",
      "51. 上海博物馆\n",
      "52. 上海博物馆\n",
      "53. 上海博物馆\n",
      "54. 上海博物馆\n",
      "55. 上海博物馆\n",
      "56. 上海博物馆\n",
      "57. 上海博物馆\n",
      "58. 上海博物馆\n",
      "59. 上海博物馆\n",
      "60. 上海博物馆\n",
      "61. 上海博物馆\n",
      "62. 上海博物馆\n",
      "63. 上海博物馆\n",
      "64. 上海博物馆\n",
      "65. 上海博物馆\n",
      "66. 上海博物馆\n",
      "67. 上海博物馆\n",
      "68. 上海博物馆\n",
      "69. 上海博物馆\n",
      "70. 上海博物馆\n",
      "71. 上海博物馆\n",
      "72. 上海博物馆\n",
      "73. 上海博物馆\n",
      "74. 上海博物馆\n",
      "75. 上海博物馆\n",
      "76. 上海博物馆\n",
      "77. 上海博物馆\n",
      "78. 上海博物馆\n",
      "79. 上海博物馆\n",
      "80. 上海博物馆\n",
      "81. 上海博物馆\n",
      "82. 上海博物馆\n",
      "83. 上海博物馆\n",
      "84. 上海博物馆\n",
      "85. 上海博物馆\n",
      "86. 上海博物馆\n",
      "87. 上海博物馆\n",
      "88. 上海博物馆\n",
      "89. 上海博物馆\n",
      "90. 上海博物馆\n",
      "91. 上海博物馆\n",
      "92. 上海博物馆\n",
      "93. 上海博物馆\n",
      "94. 上海博物馆\n",
      "95. 上海博物馆\n",
      "96. 上海博物馆\n",
      "97. 上海博物馆\n",
      "98. 上海博物馆\n",
      "\n",
      "\n",
      "06/02 04:14:20 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Sample output:\n",
      "<bos><start_of_turn>system\n",
      "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n",
      "<end_of_turn>\n",
      "<start_of_turn>user\n",
      "Please tell me five scenic spots in Shanghai<end_of_turn>\n",
      "<start_of_turn>model\n",
      "Five scenic spots in Shanghai include the Bund, the Oriental Pearl Tower, the Yu Garden, the Shanghai Disneyland, and the Shanghai Museum. The Bund is a waterfront promenade that runs along the Huangpu River and is lined with shops, restaurants, and attractions. The Oriental Pearl Tower is a 632-meter tall tower that is the tallest structure in Shanghai. The Yu Garden is a traditional Chinese garden located in the heart of the city. The Shanghai Disneyland is a theme park located in the Pudong district. The Shanghai Museum is a museum located in the heart of the city.<end_of_turn>\n",
      "\n",
      "06/02 04:14:20 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Saving checkpoint at 3500 iterations\n",
      "06/02 04:14:27 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3510/6543]  lr: 9.3101e-05  eta: 0:28:59  time: 5.9991  data_time: 5.5111  memory: 11508  loss: 1.3865  grad_norm: 0.2116\n",
      "06/02 04:14:32 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3520/6543]  lr: 9.2607e-05  eta: 0:28:53  time: 0.4927  data_time: 0.0080  memory: 11508  loss: 1.3243  grad_norm: 0.2140\n",
      "06/02 04:14:37 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3530/6543]  lr: 9.2113e-05  eta: 0:28:46  time: 0.4916  data_time: 0.0078  memory: 11508  loss: 1.3480  grad_norm: 0.2140\n",
      "06/02 04:14:42 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3540/6543]  lr: 9.1620e-05  eta: 0:28:40  time: 0.4925  data_time: 0.0080  memory: 11508  loss: 1.3178  grad_norm: 0.2129\n",
      "06/02 04:14:47 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3550/6543]  lr: 9.1127e-05  eta: 0:28:33  time: 0.4916  data_time: 0.0077  memory: 11508  loss: 1.3249  grad_norm: 0.2129\n",
      "06/02 04:14:52 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3560/6543]  lr: 9.0634e-05  eta: 0:28:27  time: 0.4924  data_time: 0.0078  memory: 11508  loss: 1.3383  grad_norm: 0.2072\n",
      "06/02 04:14:56 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3570/6543]  lr: 9.0141e-05  eta: 0:28:20  time: 0.4931  data_time: 0.0078  memory: 11508  loss: 1.3330  grad_norm: 0.2089\n",
      "06/02 04:15:01 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3580/6543]  lr: 8.9649e-05  eta: 0:28:14  time: 0.4917  data_time: 0.0078  memory: 11508  loss: 1.3630  grad_norm: 0.2089\n",
      "06/02 04:15:06 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3590/6543]  lr: 8.9157e-05  eta: 0:28:08  time: 0.4930  data_time: 0.0081  memory: 11508  loss: 1.3299  grad_norm: 0.2033\n",
      "06/02 04:15:11 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3600/6543]  lr: 8.8665e-05  eta: 0:28:01  time: 0.4924  data_time: 0.0077  memory: 11508  loss: 1.3522  grad_norm: 0.2024\n",
      "06/02 04:15:16 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3610/6543]  lr: 8.8173e-05  eta: 0:27:55  time: 0.4918  data_time: 0.0078  memory: 11508  loss: 1.3051  grad_norm: 0.2024\n",
      "06/02 04:15:21 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3620/6543]  lr: 8.7682e-05  eta: 0:27:49  time: 0.4924  data_time: 0.0078  memory: 11508  loss: 1.3062  grad_norm: 0.2007\n",
      "06/02 04:15:26 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3630/6543]  lr: 8.7191e-05  eta: 0:27:42  time: 0.4919  data_time: 0.0078  memory: 11508  loss: 1.3660  grad_norm: 0.2007\n",
      "06/02 04:15:31 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3640/6543]  lr: 8.6700e-05  eta: 0:27:36  time: 0.4929  data_time: 0.0081  memory: 11508  loss: 1.2818  grad_norm: 0.1971\n",
      "06/02 04:15:36 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3650/6543]  lr: 8.6210e-05  eta: 0:27:30  time: 0.4924  data_time: 0.0078  memory: 11508  loss: 1.3328  grad_norm: 0.1896\n",
      "06/02 04:15:41 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3660/6543]  lr: 8.5720e-05  eta: 0:27:23  time: 0.4922  data_time: 0.0080  memory: 11508  loss: 1.3682  grad_norm: 0.1896\n",
      "06/02 04:15:46 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3670/6543]  lr: 8.5230e-05  eta: 0:27:17  time: 0.4924  data_time: 0.0077  memory: 11508  loss: 1.3279  grad_norm: 0.1892\n",
      "06/02 04:15:51 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3680/6543]  lr: 8.4740e-05  eta: 0:27:11  time: 0.4925  data_time: 0.0078  memory: 11508  loss: 1.3627  grad_norm: 0.1790\n",
      "06/02 04:15:56 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3690/6543]  lr: 8.4251e-05  eta: 0:27:04  time: 0.4917  data_time: 0.0078  memory: 11508  loss: 1.3898  grad_norm: 0.1790\n",
      "06/02 04:16:00 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3700/6543]  lr: 8.3763e-05  eta: 0:26:58  time: 0.4925  data_time: 0.0078  memory: 11508  loss: 1.3465  grad_norm: 0.1816\n",
      "06/02 04:16:05 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3710/6543]  lr: 8.3275e-05  eta: 0:26:52  time: 0.4922  data_time: 0.0080  memory: 11508  loss: 1.3957  grad_norm: 0.1816\n",
      "06/02 04:16:10 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3720/6543]  lr: 8.2787e-05  eta: 0:26:46  time: 0.4929  data_time: 0.0081  memory: 11508  loss: 1.3371  grad_norm: 0.1843\n",
      "06/02 04:16:15 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3730/6543]  lr: 8.2299e-05  eta: 0:26:39  time: 0.4927  data_time: 0.0078  memory: 11508  loss: 1.4019  grad_norm: 0.1835\n",
      "06/02 04:16:20 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3740/6543]  lr: 8.1813e-05  eta: 0:26:33  time: 0.4918  data_time: 0.0078  memory: 11508  loss: 1.3889  grad_norm: 0.1835\n",
      "06/02 04:16:25 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3750/6543]  lr: 8.1326e-05  eta: 0:26:27  time: 0.4928  data_time: 0.0078  memory: 11508  loss: 1.3322  grad_norm: 0.1839\n",
      "06/02 04:16:30 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3760/6543]  lr: 8.0840e-05  eta: 0:26:21  time: 0.4928  data_time: 0.0080  memory: 11508  loss: 1.2952  grad_norm: 0.1860\n",
      "06/02 04:16:35 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3770/6543]  lr: 8.0354e-05  eta: 0:26:14  time: 0.4917  data_time: 0.0077  memory: 11508  loss: 1.3156  grad_norm: 0.1860\n",
      "06/02 04:16:40 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3780/6543]  lr: 7.9869e-05  eta: 0:26:08  time: 0.4930  data_time: 0.0081  memory: 11508  loss: 1.3045  grad_norm: 0.1940\n",
      "06/02 04:16:45 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3790/6543]  lr: 7.9385e-05  eta: 0:26:02  time: 0.4921  data_time: 0.0080  memory: 11508  loss: 1.3768  grad_norm: 0.1940\n",
      "06/02 04:16:50 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3800/6543]  lr: 7.8901e-05  eta: 0:25:56  time: 0.4931  data_time: 0.0081  memory: 11508  loss: 1.3718  grad_norm: 0.1934\n",
      "06/02 04:16:55 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3810/6543]  lr: 7.8417e-05  eta: 0:25:49  time: 0.4926  data_time: 0.0079  memory: 11508  loss: 1.3257  grad_norm: 0.1953\n",
      "06/02 04:17:00 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3820/6543]  lr: 7.7934e-05  eta: 0:25:43  time: 0.4920  data_time: 0.0080  memory: 11508  loss: 1.3644  grad_norm: 0.1953\n",
      "06/02 04:17:04 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3830/6543]  lr: 7.7452e-05  eta: 0:25:37  time: 0.4929  data_time: 0.0081  memory: 11508  loss: 1.3267  grad_norm: 0.1967\n",
      "06/02 04:17:09 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3840/6543]  lr: 7.6970e-05  eta: 0:25:31  time: 0.4928  data_time: 0.0080  memory: 11508  loss: 1.2928  grad_norm: 0.1980\n",
      "06/02 04:17:14 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3850/6543]  lr: 7.6488e-05  eta: 0:25:25  time: 0.4924  data_time: 0.0081  memory: 11508  loss: 1.2427  grad_norm: 0.1980\n",
      "06/02 04:17:19 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3860/6543]  lr: 7.6007e-05  eta: 0:25:18  time: 0.4927  data_time: 0.0079  memory: 11508  loss: 1.4096  grad_norm: 0.2007\n",
      "06/02 04:17:24 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3870/6543]  lr: 7.5527e-05  eta: 0:25:12  time: 0.4918  data_time: 0.0078  memory: 11508  loss: 1.3092  grad_norm: 0.2007\n",
      "06/02 04:17:29 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3880/6543]  lr: 7.5048e-05  eta: 0:25:06  time: 0.4928  data_time: 0.0079  memory: 11508  loss: 1.2492  grad_norm: 0.2042\n",
      "06/02 04:17:34 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3890/6543]  lr: 7.4569e-05  eta: 0:25:00  time: 0.4927  data_time: 0.0079  memory: 11508  loss: 1.3927  grad_norm: 0.2033\n",
      "06/02 04:17:39 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3900/6543]  lr: 7.4090e-05  eta: 0:24:54  time: 0.4927  data_time: 0.0083  memory: 11508  loss: 1.3390  grad_norm: 0.2033\n",
      "06/02 04:17:44 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3910/6543]  lr: 7.3612e-05  eta: 0:24:48  time: 0.4928  data_time: 0.0079  memory: 11508  loss: 1.3278  grad_norm: 0.2080\n",
      "06/02 04:17:49 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3920/6543]  lr: 7.3135e-05  eta: 0:24:42  time: 0.4930  data_time: 0.0081  memory: 11508  loss: 1.3557  grad_norm: 0.2034\n",
      "06/02 04:17:54 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3930/6543]  lr: 7.2659e-05  eta: 0:24:35  time: 0.4916  data_time: 0.0078  memory: 11508  loss: 1.3445  grad_norm: 0.2034\n",
      "06/02 04:17:59 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3940/6543]  lr: 7.2183e-05  eta: 0:24:29  time: 0.4923  data_time: 0.0078  memory: 11508  loss: 1.2667  grad_norm: 0.1973\n",
      "06/02 04:18:04 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3950/6543]  lr: 7.1708e-05  eta: 0:24:23  time: 0.4920  data_time: 0.0080  memory: 11508  loss: 1.4089  grad_norm: 0.1973\n",
      "06/02 04:18:09 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3960/6543]  lr: 7.1234e-05  eta: 0:24:17  time: 0.4933  data_time: 0.0087  memory: 11508  loss: 1.3374  grad_norm: 0.1972\n",
      "06/02 04:18:13 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3970/6543]  lr: 7.0760e-05  eta: 0:24:11  time: 0.4928  data_time: 0.0079  memory: 11508  loss: 1.2959  grad_norm: 0.1942\n",
      "06/02 04:18:18 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3980/6543]  lr: 7.0287e-05  eta: 0:24:05  time: 0.4919  data_time: 0.0079  memory: 11508  loss: 1.2804  grad_norm: 0.1942\n",
      "06/02 04:18:23 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [3990/6543]  lr: 6.9815e-05  eta: 0:23:59  time: 0.4927  data_time: 0.0080  memory: 11508  loss: 1.2604  grad_norm: 0.1981\n",
      "06/02 04:18:28 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Exp name: gemma_2b_it_qlora_alpaca_e3_20240602_034031\n",
      "06/02 04:18:28 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4000/6543]  lr: 6.9343e-05  eta: 0:23:53  time: 0.4928  data_time: 0.0079  memory: 11508  loss: 1.3944  grad_norm: 0.1972\n",
      "06/02 04:18:28 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - after_train_iter in EvaluateChatHook.\n",
      "06/02 04:19:12 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Sample output:\n",
      "<bos><start_of_turn>system\n",
      "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n",
      "<end_of_turn>\n",
      "<start_of_turn>user\n",
      "请给我介绍五个上海的景点<end_of_turn>\n",
      "<start_of_turn>model\n",
      "1. 上海故宫\n",
      "2. 上海博物馆\n",
      "3. 上海虹桥\n",
      "4. 上海天坛\n",
      "5. 上海中山公园 \n",
      "6. 上海南山公园\n",
      "7. 上海人民公园\n",
      "8. 上海市博物馆\n",
      "9. 上海科学技术馆\n",
      "10. 上海海洋博物馆 \n",
      "11. 上海博物馆\n",
      "12. 上海自然博物馆\n",
      "13. 上海自然科学博物馆\n",
      "14. 上海自然史博物馆\n",
      "15. 上海自然史博物馆\n",
      "16. 上海自然史博物馆\n",
      "17. 上海自然史博物馆\n",
      "18. 上海自然史博物馆\n",
      "19. 上海自然史博物馆\n",
      "20. 上海自然史博物馆\n",
      "21. 上海自然史博物馆\n",
      "22. 上海自然史博物馆\n",
      "23. 上海自然史博物馆\n",
      "24. 上海自然史博物馆\n",
      "25. 上海自然史博物馆\n",
      "26. 上海自然史博物馆\n",
      "27. 上海自然史博物馆\n",
      "28. 上海自然史博物馆\n",
      "29. 上海自然史博物馆\n",
      "30. 上海自然史博物馆\n",
      "31. 上海自然史博物馆\n",
      "32. 上海自然史博物馆\n",
      "33. 上海自然史博物馆\n",
      "34. 上海自然史博物馆\n",
      "35. 上海自然史博物馆\n",
      "36. 上海自然史博物馆\n",
      "37. 上海自然史博物馆\n",
      "38. 上海自然史博物馆\n",
      "39. 上海自然史博物馆\n",
      "40. 上海自然史博物馆\n",
      "41. 上海自然史博物馆\n",
      "42. 上海自然史博物馆\n",
      "43. 上海自然史博物馆\n",
      "44. 上海自然史博物馆\n",
      "45. 上海自然史博物馆\n",
      "46. 上海自然史博物馆\n",
      "47. 上海自然史博物馆\n",
      "48. 上海自然史博物馆\n",
      "49. 上海自然史博物馆\n",
      "50. 上海自然史博物馆\n",
      "51. 上海自然史博物馆\n",
      "52. 上海自然史博物馆\n",
      "53. 上海自然史博物馆\n",
      "54. 上海自然史博物馆\n",
      "55. 上海自然史博物馆\n",
      "56. 上海自然史博物馆\n",
      "57. 上海自然史博物馆\n",
      "58. 上海自然史博物馆\n",
      "59. 上海自然史博物馆\n",
      "60. 上海自然史博物馆\n",
      "61. 上海自然史博物馆\n",
      "62. 上海自然史博物馆\n",
      "63. 上海自然史博物馆\n",
      "64. 上海自然史博物馆\n",
      "65. 上海自然史博物馆\n",
      "66. 上海自然史博物馆\n",
      "67. 上海自然史博物馆\n",
      "68. 上海自然史博物馆\n",
      "69. 上海自然史博物馆\n",
      "70. 上海自然史博物馆\n",
      "71. 上海自然史博物馆\n",
      "72. 上海自然史博物馆\n",
      "73. 上海自然史博物馆\n",
      "74. 上海自然史博物馆\n",
      "75. 上海自然史博物馆\n",
      "76. 上海自然史博物馆\n",
      "77. 上海自然史博物馆\n",
      "78.\n",
      "\n",
      "06/02 04:19:15 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Sample output:\n",
      "<bos><start_of_turn>system\n",
      "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n",
      "<end_of_turn>\n",
      "<start_of_turn>user\n",
      "Please tell me five scenic spots in Shanghai<end_of_turn>\n",
      "<start_of_turn>model\n",
      "1. The Bund\n",
      "2. The Oriental Pearl Tower\n",
      "3. The Shanghai Tower\n",
      "4. The Yu Garden\n",
      "5. The Shanghai Disneyland Resort<end_of_turn>\n",
      "\n",
      "06/02 04:19:15 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Saving checkpoint at 4000 iterations\n",
      "06/02 04:19:22 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4010/6543]  lr: 6.8872e-05  eta: 0:24:17  time: 5.3684  data_time: 4.8812  memory: 11508  loss: 1.2633  grad_norm: 0.1972\n",
      "06/02 04:19:27 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4020/6543]  lr: 6.8402e-05  eta: 0:24:11  time: 0.4924  data_time: 0.0079  memory: 11508  loss: 1.3614  grad_norm: 0.1941\n",
      "06/02 04:19:32 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4030/6543]  lr: 6.7933e-05  eta: 0:24:05  time: 0.4914  data_time: 0.0078  memory: 11508  loss: 1.3148  grad_norm: 0.1941\n",
      "06/02 04:19:37 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4040/6543]  lr: 6.7465e-05  eta: 0:23:59  time: 0.4928  data_time: 0.0079  memory: 11508  loss: 1.3330  grad_norm: 0.1848\n",
      "06/02 04:19:42 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4050/6543]  lr: 6.6997e-05  eta: 0:23:52  time: 0.4923  data_time: 0.0078  memory: 11508  loss: 1.3846  grad_norm: 0.1877\n",
      "06/02 04:19:47 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4060/6543]  lr: 6.6530e-05  eta: 0:23:46  time: 0.4923  data_time: 0.0082  memory: 11508  loss: 1.3454  grad_norm: 0.1877\n",
      "06/02 04:19:51 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4070/6543]  lr: 6.6064e-05  eta: 0:23:40  time: 0.4928  data_time: 0.0082  memory: 11508  loss: 1.3298  grad_norm: 0.1823\n",
      "06/02 04:19:56 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4080/6543]  lr: 6.5599e-05  eta: 0:23:34  time: 0.4925  data_time: 0.0078  memory: 11508  loss: 1.3565  grad_norm: 0.1828\n",
      "06/02 04:20:01 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4090/6543]  lr: 6.5135e-05  eta: 0:23:27  time: 0.4919  data_time: 0.0079  memory: 11508  loss: 1.4059  grad_norm: 0.1828\n",
      "06/02 04:20:06 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4100/6543]  lr: 6.4671e-05  eta: 0:23:21  time: 0.4926  data_time: 0.0079  memory: 11508  loss: 1.3678  grad_norm: 0.1844\n",
      "06/02 04:20:11 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4110/6543]  lr: 6.4209e-05  eta: 0:23:15  time: 0.4919  data_time: 0.0079  memory: 11508  loss: 1.2938  grad_norm: 0.1844\n",
      "06/02 04:20:16 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4120/6543]  lr: 6.3747e-05  eta: 0:23:09  time: 0.4925  data_time: 0.0078  memory: 11508  loss: 1.3525  grad_norm: 0.1857\n",
      "06/02 04:20:21 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4130/6543]  lr: 6.3286e-05  eta: 0:23:02  time: 0.4928  data_time: 0.0080  memory: 11508  loss: 1.3153  grad_norm: 0.1868\n",
      "06/02 04:20:26 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4140/6543]  lr: 6.2826e-05  eta: 0:22:56  time: 0.4920  data_time: 0.0079  memory: 11508  loss: 1.2880  grad_norm: 0.1868\n",
      "06/02 04:20:31 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4150/6543]  lr: 6.2367e-05  eta: 0:22:50  time: 0.4927  data_time: 0.0079  memory: 11508  loss: 1.2844  grad_norm: 0.1764\n",
      "06/02 04:20:36 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4160/6543]  lr: 6.1909e-05  eta: 0:22:44  time: 0.4927  data_time: 0.0079  memory: 11508  loss: 1.2910  grad_norm: 0.1773\n",
      "06/02 04:20:41 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4170/6543]  lr: 6.1452e-05  eta: 0:22:38  time: 0.4918  data_time: 0.0079  memory: 11508  loss: 1.3720  grad_norm: 0.1773\n",
      "06/02 04:20:46 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4180/6543]  lr: 6.0995e-05  eta: 0:22:32  time: 0.4927  data_time: 0.0079  memory: 11508  loss: 1.3372  grad_norm: 0.1801\n",
      "06/02 04:20:51 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4190/6543]  lr: 6.0540e-05  eta: 0:22:25  time: 0.4919  data_time: 0.0079  memory: 11508  loss: 1.3097  grad_norm: 0.1801\n",
      "06/02 04:20:55 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4200/6543]  lr: 6.0086e-05  eta: 0:22:19  time: 0.4926  data_time: 0.0079  memory: 11508  loss: 1.3660  grad_norm: 0.1808\n",
      "06/02 04:21:00 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4210/6543]  lr: 5.9633e-05  eta: 0:22:13  time: 0.4927  data_time: 0.0079  memory: 11508  loss: 1.3137  grad_norm: 0.1807\n",
      "06/02 04:21:05 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4220/6543]  lr: 5.9180e-05  eta: 0:22:07  time: 0.4920  data_time: 0.0079  memory: 11508  loss: 1.3304  grad_norm: 0.1807\n",
      "06/02 04:21:10 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4230/6543]  lr: 5.8729e-05  eta: 0:22:01  time: 0.4926  data_time: 0.0079  memory: 11508  loss: 1.3802  grad_norm: 0.1855\n",
      "06/02 04:21:15 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4240/6543]  lr: 5.8278e-05  eta: 0:21:55  time: 0.4926  data_time: 0.0078  memory: 11508  loss: 1.3422  grad_norm: 0.1872\n",
      "06/02 04:21:20 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4250/6543]  lr: 5.7829e-05  eta: 0:21:48  time: 0.4919  data_time: 0.0079  memory: 11508  loss: 1.2903  grad_norm: 0.1872\n",
      "06/02 04:21:25 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4260/6543]  lr: 5.7381e-05  eta: 0:21:42  time: 0.4928  data_time: 0.0079  memory: 11508  loss: 1.3825  grad_norm: 0.1840\n",
      "06/02 04:21:30 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4270/6543]  lr: 5.6934e-05  eta: 0:21:36  time: 0.4918  data_time: 0.0079  memory: 11508  loss: 1.2650  grad_norm: 0.1840\n",
      "06/02 04:21:35 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4280/6543]  lr: 5.6487e-05  eta: 0:21:30  time: 0.4926  data_time: 0.0079  memory: 11508  loss: 1.3051  grad_norm: 0.1847\n",
      "06/02 04:21:40 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4290/6543]  lr: 5.6042e-05  eta: 0:21:24  time: 0.4927  data_time: 0.0079  memory: 11508  loss: 1.3361  grad_norm: 0.1862\n",
      "06/02 04:21:45 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4300/6543]  lr: 5.5598e-05  eta: 0:21:18  time: 0.4923  data_time: 0.0081  memory: 11508  loss: 1.3295  grad_norm: 0.1862\n",
      "06/02 04:21:50 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4310/6543]  lr: 5.5155e-05  eta: 0:21:12  time: 0.4923  data_time: 0.0077  memory: 11508  loss: 1.3753  grad_norm: 0.1870\n",
      "06/02 04:21:55 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4320/6543]  lr: 5.4713e-05  eta: 0:21:06  time: 0.4928  data_time: 0.0079  memory: 11508  loss: 1.3590  grad_norm: 0.1883\n",
      "06/02 04:21:59 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4330/6543]  lr: 5.4273e-05  eta: 0:21:00  time: 0.4918  data_time: 0.0079  memory: 11508  loss: 1.3384  grad_norm: 0.1883\n",
      "06/02 04:22:04 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4340/6543]  lr: 5.3833e-05  eta: 0:20:54  time: 0.4927  data_time: 0.0079  memory: 11508  loss: 1.3853  grad_norm: 0.1850\n",
      "06/02 04:22:09 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4350/6543]  lr: 5.3395e-05  eta: 0:20:47  time: 0.4920  data_time: 0.0079  memory: 11508  loss: 1.3539  grad_norm: 0.1850\n",
      "06/02 04:22:14 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4360/6543]  lr: 5.2957e-05  eta: 0:20:41  time: 0.4929  data_time: 0.0082  memory: 11508  loss: 1.3985  grad_norm: 0.1843\n",
      "06/02 04:22:21 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4370/6543]  lr: 5.2521e-05  eta: 0:20:36  time: 0.6959  data_time: 0.2083  memory: 11508  loss: 1.2866  grad_norm: 0.1837\n",
      "06/02 04:22:26 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4380/6543]  lr: 5.2086e-05  eta: 0:20:30  time: 0.4919  data_time: 0.0079  memory: 11508  loss: 1.3657  grad_norm: 0.1837\n",
      "06/02 04:22:31 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4390/6543]  lr: 5.1652e-05  eta: 0:20:24  time: 0.4928  data_time: 0.0080  memory: 11508  loss: 1.3532  grad_norm: 0.1796\n",
      "06/02 04:22:36 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4400/6543]  lr: 5.1219e-05  eta: 0:20:18  time: 0.4925  data_time: 0.0078  memory: 11508  loss: 1.3268  grad_norm: 0.1822\n",
      "06/02 04:22:41 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4410/6543]  lr: 5.0788e-05  eta: 0:20:12  time: 0.4921  data_time: 0.0080  memory: 11508  loss: 1.3081  grad_norm: 0.1822\n",
      "06/02 04:22:46 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4420/6543]  lr: 5.0358e-05  eta: 0:20:06  time: 0.4926  data_time: 0.0079  memory: 11508  loss: 1.2777  grad_norm: 0.1858\n",
      "06/02 04:22:51 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4430/6543]  lr: 4.9929e-05  eta: 0:20:00  time: 0.4924  data_time: 0.0081  memory: 11508  loss: 1.2807  grad_norm: 0.1858\n",
      "06/02 04:22:56 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4440/6543]  lr: 4.9501e-05  eta: 0:19:54  time: 0.4926  data_time: 0.0079  memory: 11508  loss: 1.2679  grad_norm: 0.1850\n",
      "06/02 04:23:01 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4450/6543]  lr: 4.9074e-05  eta: 0:19:48  time: 0.4924  data_time: 0.0078  memory: 11508  loss: 1.2823  grad_norm: 0.1837\n",
      "06/02 04:23:06 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4460/6543]  lr: 4.8649e-05  eta: 0:19:42  time: 0.4923  data_time: 0.0081  memory: 11508  loss: 1.2355  grad_norm: 0.1837\n",
      "06/02 04:23:10 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4470/6543]  lr: 4.8225e-05  eta: 0:19:36  time: 0.4926  data_time: 0.0079  memory: 11508  loss: 1.2000  grad_norm: 0.1858\n",
      "06/02 04:23:15 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4480/6543]  lr: 4.7802e-05  eta: 0:19:30  time: 0.4927  data_time: 0.0078  memory: 11508  loss: 1.2578  grad_norm: 0.1904\n",
      "06/02 04:23:20 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4490/6543]  lr: 4.7380e-05  eta: 0:19:24  time: 0.4927  data_time: 0.0079  memory: 11508  loss: 1.2638  grad_norm: 0.1904\n",
      "06/02 04:23:25 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4500/6543]  lr: 4.6960e-05  eta: 0:19:18  time: 0.4926  data_time: 0.0079  memory: 11508  loss: 1.3147  grad_norm: 0.1917\n",
      "06/02 04:23:25 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - after_train_iter in EvaluateChatHook.\n",
      "06/02 04:23:27 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Sample output:\n",
      "<bos><start_of_turn>system\n",
      "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n",
      "<end_of_turn>\n",
      "<start_of_turn>user\n",
      "请给我介绍五个上海的景点<end_of_turn>\n",
      "<start_of_turn>model\n",
      "1. 上海故宫\n",
      "2. 上海博物馆\n",
      "3. 上海虹桥\n",
      "4. 上海天坛\n",
      "5. 上海南山公园<end_of_turn>\n",
      "\n",
      "06/02 04:23:30 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Sample output:\n",
      "<bos><start_of_turn>system\n",
      "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n",
      "<end_of_turn>\n",
      "<start_of_turn>user\n",
      "Please tell me five scenic spots in Shanghai<end_of_turn>\n",
      "<start_of_turn>model\n",
      "1. The Bund\n",
      "2. The Oriental Pearl Tower\n",
      "3. The Shanghai Tower\n",
      "4. The Yu Garden\n",
      "5. The Shanghai Disneyland Resort<end_of_turn>\n",
      "\n",
      "06/02 04:23:30 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Saving checkpoint at 4500 iterations\n",
      "06/02 04:23:37 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4510/6543]  lr: 4.6541e-05  eta: 0:19:15  time: 1.1799  data_time: 0.6936  memory: 11508  loss: 1.2751  grad_norm: 0.1917\n",
      "06/02 04:23:42 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4520/6543]  lr: 4.6123e-05  eta: 0:19:09  time: 0.4928  data_time: 0.0081  memory: 11508  loss: 1.3228  grad_norm: 0.1967\n",
      "06/02 04:23:47 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4530/6543]  lr: 4.5707e-05  eta: 0:19:03  time: 0.4925  data_time: 0.0078  memory: 11508  loss: 1.3832  grad_norm: 0.2029\n",
      "06/02 04:23:52 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4540/6543]  lr: 4.5292e-05  eta: 0:18:57  time: 0.4920  data_time: 0.0079  memory: 11508  loss: 1.3289  grad_norm: 0.2029\n",
      "06/02 04:23:57 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4550/6543]  lr: 4.4878e-05  eta: 0:18:51  time: 0.4930  data_time: 0.0082  memory: 11508  loss: 1.3080  grad_norm: 0.2113\n",
      "06/02 04:24:02 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4560/6543]  lr: 4.4466e-05  eta: 0:18:45  time: 0.4925  data_time: 0.0078  memory: 11508  loss: 1.2847  grad_norm: 0.2077\n",
      "06/02 04:24:07 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4570/6543]  lr: 4.4055e-05  eta: 0:18:39  time: 0.4924  data_time: 0.0080  memory: 11508  loss: 1.3286  grad_norm: 0.2077\n",
      "06/02 04:24:12 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4580/6543]  lr: 4.3646e-05  eta: 0:18:33  time: 0.4925  data_time: 0.0079  memory: 11508  loss: 1.3225  grad_norm: 0.2068\n",
      "06/02 04:24:16 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4590/6543]  lr: 4.3237e-05  eta: 0:18:27  time: 0.4920  data_time: 0.0080  memory: 11508  loss: 1.2806  grad_norm: 0.2068\n",
      "06/02 04:24:21 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4600/6543]  lr: 4.2831e-05  eta: 0:18:21  time: 0.4925  data_time: 0.0079  memory: 11508  loss: 1.2363  grad_norm: 0.2105\n",
      "06/02 04:24:26 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4610/6543]  lr: 4.2425e-05  eta: 0:18:15  time: 0.4926  data_time: 0.0079  memory: 11508  loss: 1.2796  grad_norm: 0.2099\n",
      "06/02 04:24:31 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4620/6543]  lr: 4.2021e-05  eta: 0:18:09  time: 0.4919  data_time: 0.0080  memory: 11508  loss: 1.2691  grad_norm: 0.2099\n",
      "06/02 04:24:36 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4630/6543]  lr: 4.1619e-05  eta: 0:18:03  time: 0.4924  data_time: 0.0078  memory: 11508  loss: 1.3161  grad_norm: 0.2079\n",
      "06/02 04:24:41 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4640/6543]  lr: 4.1217e-05  eta: 0:17:57  time: 0.4931  data_time: 0.0081  memory: 11508  loss: 1.2741  grad_norm: 0.2004\n",
      "06/02 04:24:46 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4650/6543]  lr: 4.0818e-05  eta: 0:17:51  time: 0.4918  data_time: 0.0079  memory: 11508  loss: 1.2024  grad_norm: 0.2004\n",
      "06/02 04:24:51 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4660/6543]  lr: 4.0420e-05  eta: 0:17:45  time: 0.4928  data_time: 0.0080  memory: 11508  loss: 1.3363  grad_norm: 0.1994\n",
      "06/02 04:24:56 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4670/6543]  lr: 4.0023e-05  eta: 0:17:39  time: 0.4919  data_time: 0.0079  memory: 11508  loss: 1.2878  grad_norm: 0.1994\n",
      "06/02 04:25:01 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4680/6543]  lr: 3.9627e-05  eta: 0:17:33  time: 0.4925  data_time: 0.0078  memory: 11508  loss: 1.2727  grad_norm: 0.1965\n",
      "06/02 04:25:06 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4690/6543]  lr: 3.9234e-05  eta: 0:17:27  time: 0.4930  data_time: 0.0080  memory: 11508  loss: 1.2573  grad_norm: 0.1891\n",
      "06/02 04:25:11 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4700/6543]  lr: 3.8841e-05  eta: 0:17:21  time: 0.4920  data_time: 0.0080  memory: 11508  loss: 1.3011  grad_norm: 0.1891\n",
      "06/02 04:25:16 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4710/6543]  lr: 3.8450e-05  eta: 0:17:15  time: 0.4929  data_time: 0.0080  memory: 11508  loss: 1.3077  grad_norm: 0.1805\n",
      "06/02 04:25:20 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4720/6543]  lr: 3.8061e-05  eta: 0:17:09  time: 0.4926  data_time: 0.0078  memory: 11508  loss: 1.3817  grad_norm: 0.1814\n",
      "06/02 04:25:25 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4730/6543]  lr: 3.7673e-05  eta: 0:17:03  time: 0.4919  data_time: 0.0079  memory: 11508  loss: 1.2609  grad_norm: 0.1814\n",
      "06/02 04:25:30 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4740/6543]  lr: 3.7287e-05  eta: 0:16:58  time: 0.4931  data_time: 0.0079  memory: 11508  loss: 1.3199  grad_norm: 0.1794\n",
      "06/02 04:25:35 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4750/6543]  lr: 3.6902e-05  eta: 0:16:52  time: 0.4916  data_time: 0.0077  memory: 11508  loss: 1.2501  grad_norm: 0.1794\n",
      "06/02 04:25:40 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4760/6543]  lr: 3.6519e-05  eta: 0:16:46  time: 0.4929  data_time: 0.0081  memory: 11508  loss: 1.3090  grad_norm: 0.1765\n",
      "06/02 04:25:45 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4770/6543]  lr: 3.6137e-05  eta: 0:16:40  time: 0.4924  data_time: 0.0078  memory: 11508  loss: 1.2041  grad_norm: 0.1771\n",
      "06/02 04:25:50 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4780/6543]  lr: 3.5757e-05  eta: 0:16:34  time: 0.4920  data_time: 0.0080  memory: 11508  loss: 1.2784  grad_norm: 0.1771\n",
      "06/02 04:25:55 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4790/6543]  lr: 3.5379e-05  eta: 0:16:28  time: 0.4931  data_time: 0.0079  memory: 11508  loss: 1.2821  grad_norm: 0.1787\n",
      "06/02 04:26:00 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4800/6543]  lr: 3.5002e-05  eta: 0:16:22  time: 0.4929  data_time: 0.0080  memory: 11508  loss: 1.3129  grad_norm: 0.1790\n",
      "06/02 04:26:05 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4810/6543]  lr: 3.4626e-05  eta: 0:16:16  time: 0.4926  data_time: 0.0080  memory: 11508  loss: 1.2950  grad_norm: 0.1790\n",
      "06/02 04:26:10 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4820/6543]  lr: 3.4252e-05  eta: 0:16:10  time: 0.4925  data_time: 0.0079  memory: 11508  loss: 1.2194  grad_norm: 0.1785\n",
      "06/02 04:26:15 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4830/6543]  lr: 3.3880e-05  eta: 0:16:04  time: 0.4920  data_time: 0.0080  memory: 11508  loss: 1.2695  grad_norm: 0.1785\n",
      "06/02 04:26:20 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4840/6543]  lr: 3.3510e-05  eta: 0:15:59  time: 0.4926  data_time: 0.0078  memory: 11508  loss: 1.2171  grad_norm: 0.1783\n",
      "06/02 04:26:24 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4850/6543]  lr: 3.3141e-05  eta: 0:15:53  time: 0.4929  data_time: 0.0081  memory: 11508  loss: 1.2968  grad_norm: 0.1778\n",
      "06/02 04:26:29 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4860/6543]  lr: 3.2774e-05  eta: 0:15:47  time: 0.4921  data_time: 0.0080  memory: 11508  loss: 1.2987  grad_norm: 0.1778\n",
      "06/02 04:26:34 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4870/6543]  lr: 3.2408e-05  eta: 0:15:41  time: 0.4924  data_time: 0.0078  memory: 11508  loss: 1.2878  grad_norm: 0.1789\n",
      "06/02 04:26:39 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4880/6543]  lr: 3.2044e-05  eta: 0:15:35  time: 0.4930  data_time: 0.0081  memory: 11508  loss: 1.2576  grad_norm: 0.1784\n",
      "06/02 04:26:44 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4890/6543]  lr: 3.1682e-05  eta: 0:15:29  time: 0.4917  data_time: 0.0078  memory: 11508  loss: 1.2827  grad_norm: 0.1784\n",
      "06/02 04:26:49 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4900/6543]  lr: 3.1321e-05  eta: 0:15:23  time: 0.4927  data_time: 0.0080  memory: 11508  loss: 1.2712  grad_norm: 0.1782\n",
      "06/02 04:26:54 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4910/6543]  lr: 3.0962e-05  eta: 0:15:17  time: 0.4919  data_time: 0.0079  memory: 11508  loss: 1.2834  grad_norm: 0.1782\n",
      "06/02 04:26:59 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4920/6543]  lr: 3.0605e-05  eta: 0:15:12  time: 0.4928  data_time: 0.0080  memory: 11508  loss: 1.2748  grad_norm: 0.1784\n",
      "06/02 04:27:04 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4930/6543]  lr: 3.0249e-05  eta: 0:15:06  time: 0.4931  data_time: 0.0082  memory: 11508  loss: 1.2657  grad_norm: 0.1786\n",
      "06/02 04:27:09 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4940/6543]  lr: 2.9896e-05  eta: 0:15:00  time: 0.4924  data_time: 0.0081  memory: 11508  loss: 1.2935  grad_norm: 0.1786\n",
      "06/02 04:27:14 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4950/6543]  lr: 2.9544e-05  eta: 0:14:54  time: 0.4932  data_time: 0.0080  memory: 11508  loss: 1.3063  grad_norm: 0.1796\n",
      "06/02 04:27:19 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4960/6543]  lr: 2.9193e-05  eta: 0:14:48  time: 0.4925  data_time: 0.0078  memory: 11508  loss: 1.2726  grad_norm: 0.1808\n",
      "06/02 04:27:24 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4970/6543]  lr: 2.8844e-05  eta: 0:14:42  time: 0.4918  data_time: 0.0079  memory: 11508  loss: 1.2563  grad_norm: 0.1808\n",
      "06/02 04:27:29 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4980/6543]  lr: 2.8498e-05  eta: 0:14:37  time: 0.4926  data_time: 0.0079  memory: 11508  loss: 1.2501  grad_norm: 0.1811\n",
      "06/02 04:27:33 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [4990/6543]  lr: 2.8152e-05  eta: 0:14:31  time: 0.4916  data_time: 0.0078  memory: 11508  loss: 1.2860  grad_norm: 0.1811\n",
      "06/02 04:27:38 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Exp name: gemma_2b_it_qlora_alpaca_e3_20240602_034031\n",
      "06/02 04:27:38 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5000/6543]  lr: 2.7809e-05  eta: 0:14:25  time: 0.4932  data_time: 0.0083  memory: 11508  loss: 1.2373  grad_norm: 0.1801\n",
      "06/02 04:27:38 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - after_train_iter in EvaluateChatHook.\n",
      "06/02 04:27:41 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Sample output:\n",
      "<bos><start_of_turn>system\n",
      "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n",
      "<end_of_turn>\n",
      "<start_of_turn>user\n",
      "请给我介绍五个上海的景点<end_of_turn>\n",
      "<start_of_turn>model\n",
      "1. 上海故宫\n",
      "2. 上海博物馆\n",
      "3. 上海虹桥\n",
      "4. 上海天坛\n",
      "5. 上海中山公园<end_of_turn>\n",
      "\n",
      "06/02 04:27:43 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Sample output:\n",
      "<bos><start_of_turn>system\n",
      "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n",
      "<end_of_turn>\n",
      "<start_of_turn>user\n",
      "Please tell me five scenic spots in Shanghai<end_of_turn>\n",
      "<start_of_turn>model\n",
      "1. The Bund\n",
      "2. Oriental Pearl Tower\n",
      "3. Yu Garden\n",
      "4. The Shanghai Tower\n",
      "5. The Bund Waterfront Park<end_of_turn>\n",
      "\n",
      "06/02 04:27:43 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Saving checkpoint at 5000 iterations\n",
      "06/02 04:27:50 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5010/6543]  lr: 2.7467e-05  eta: 0:14:21  time: 1.1532  data_time: 0.6667  memory: 11508  loss: 1.3226  grad_norm: 0.1819\n",
      "06/02 04:27:55 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5020/6543]  lr: 2.7128e-05  eta: 0:14:15  time: 0.4919  data_time: 0.0078  memory: 11508  loss: 1.3481  grad_norm: 0.1819\n",
      "06/02 04:28:00 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5030/6543]  lr: 2.6789e-05  eta: 0:14:09  time: 0.4932  data_time: 0.0083  memory: 11508  loss: 1.2976  grad_norm: 0.1828\n",
      "06/02 04:28:05 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5040/6543]  lr: 2.6453e-05  eta: 0:14:04  time: 0.4929  data_time: 0.0079  memory: 11508  loss: 1.3016  grad_norm: 0.1836\n",
      "06/02 04:28:10 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5050/6543]  lr: 2.6119e-05  eta: 0:13:58  time: 0.4918  data_time: 0.0079  memory: 11508  loss: 1.3198  grad_norm: 0.1836\n",
      "06/02 04:28:15 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5060/6543]  lr: 2.5786e-05  eta: 0:13:52  time: 0.4927  data_time: 0.0080  memory: 11508  loss: 1.2834  grad_norm: 0.1843\n",
      "06/02 04:28:19 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5070/6543]  lr: 2.5455e-05  eta: 0:13:46  time: 0.4920  data_time: 0.0080  memory: 11508  loss: 1.2235  grad_norm: 0.1843\n",
      "06/02 04:28:24 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5080/6543]  lr: 2.5126e-05  eta: 0:13:40  time: 0.4927  data_time: 0.0079  memory: 11508  loss: 1.3128  grad_norm: 0.1840\n",
      "06/02 04:28:29 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5090/6543]  lr: 2.4799e-05  eta: 0:13:35  time: 0.4928  data_time: 0.0080  memory: 11508  loss: 1.2736  grad_norm: 0.1842\n",
      "06/02 04:28:34 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5100/6543]  lr: 2.4474e-05  eta: 0:13:29  time: 0.4922  data_time: 0.0080  memory: 11508  loss: 1.2962  grad_norm: 0.1842\n",
      "06/02 04:28:39 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5110/6543]  lr: 2.4150e-05  eta: 0:13:23  time: 0.4926  data_time: 0.0079  memory: 11508  loss: 1.3027  grad_norm: 0.1823\n",
      "06/02 04:28:44 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5120/6543]  lr: 2.3829e-05  eta: 0:13:17  time: 0.4928  data_time: 0.0079  memory: 11508  loss: 1.2359  grad_norm: 0.1812\n",
      "06/02 04:28:49 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5130/6543]  lr: 2.3509e-05  eta: 0:13:11  time: 0.4920  data_time: 0.0080  memory: 11508  loss: 1.2827  grad_norm: 0.1812\n",
      "06/02 04:28:54 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5140/6543]  lr: 2.3191e-05  eta: 0:13:06  time: 0.4925  data_time: 0.0078  memory: 11508  loss: 1.3277  grad_norm: 0.1806\n",
      "06/02 04:28:59 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5150/6543]  lr: 2.2875e-05  eta: 0:13:00  time: 0.4923  data_time: 0.0081  memory: 11508  loss: 1.2777  grad_norm: 0.1806\n",
      "06/02 04:29:04 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5160/6543]  lr: 2.2561e-05  eta: 0:12:54  time: 0.4928  data_time: 0.0079  memory: 11508  loss: 1.2595  grad_norm: 0.1805\n",
      "06/02 04:29:09 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5170/6543]  lr: 2.2249e-05  eta: 0:12:48  time: 0.4926  data_time: 0.0079  memory: 11508  loss: 1.2925  grad_norm: 0.1798\n",
      "06/02 04:29:14 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5180/6543]  lr: 2.1938e-05  eta: 0:12:42  time: 0.4920  data_time: 0.0079  memory: 11508  loss: 1.2154  grad_norm: 0.1798\n",
      "06/02 04:29:19 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5190/6543]  lr: 2.1630e-05  eta: 0:12:37  time: 0.4925  data_time: 0.0079  memory: 11508  loss: 1.3246  grad_norm: 0.1786\n",
      "06/02 04:29:23 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5200/6543]  lr: 2.1323e-05  eta: 0:12:31  time: 0.4931  data_time: 0.0081  memory: 11508  loss: 1.3474  grad_norm: 0.1795\n",
      "06/02 04:29:28 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5210/6543]  lr: 2.1019e-05  eta: 0:12:25  time: 0.4920  data_time: 0.0079  memory: 11508  loss: 1.2209  grad_norm: 0.1795\n",
      "06/02 04:29:33 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5220/6543]  lr: 2.0716e-05  eta: 0:12:19  time: 0.4932  data_time: 0.0082  memory: 11508  loss: 1.3012  grad_norm: 0.1796\n",
      "06/02 04:29:38 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5230/6543]  lr: 2.0415e-05  eta: 0:12:14  time: 0.4928  data_time: 0.0087  memory: 11508  loss: 1.2589  grad_norm: 0.1796\n",
      "06/02 04:29:43 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5240/6543]  lr: 2.0117e-05  eta: 0:12:08  time: 0.4925  data_time: 0.0078  memory: 11508  loss: 1.3004  grad_norm: 0.1797\n",
      "06/02 04:29:48 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5250/6543]  lr: 1.9820e-05  eta: 0:12:02  time: 0.4926  data_time: 0.0079  memory: 11508  loss: 1.2495  grad_norm: 0.1793\n",
      "06/02 04:29:53 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5260/6543]  lr: 1.9525e-05  eta: 0:11:56  time: 0.4920  data_time: 0.0078  memory: 11508  loss: 1.3072  grad_norm: 0.1793\n",
      "06/02 04:29:58 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5270/6543]  lr: 1.9232e-05  eta: 0:11:51  time: 0.4928  data_time: 0.0080  memory: 11508  loss: 1.2811  grad_norm: 0.1796\n",
      "06/02 04:30:03 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5280/6543]  lr: 1.8941e-05  eta: 0:11:45  time: 0.4930  data_time: 0.0080  memory: 11508  loss: 1.3600  grad_norm: 0.1815\n",
      "06/02 04:30:08 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5290/6543]  lr: 1.8653e-05  eta: 0:11:39  time: 0.4920  data_time: 0.0079  memory: 11508  loss: 1.3153  grad_norm: 0.1815\n",
      "06/02 04:30:13 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5300/6543]  lr: 1.8366e-05  eta: 0:11:33  time: 0.4925  data_time: 0.0079  memory: 11508  loss: 1.2999  grad_norm: 0.1821\n",
      "06/02 04:30:18 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5310/6543]  lr: 1.8081e-05  eta: 0:11:28  time: 0.4919  data_time: 0.0078  memory: 11508  loss: 1.3108  grad_norm: 0.1821\n",
      "06/02 04:30:23 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5320/6543]  lr: 1.7798e-05  eta: 0:11:22  time: 0.4928  data_time: 0.0080  memory: 11508  loss: 1.3042  grad_norm: 0.1832\n",
      "06/02 04:30:27 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5330/6543]  lr: 1.7517e-05  eta: 0:11:16  time: 0.4926  data_time: 0.0079  memory: 11508  loss: 1.3105  grad_norm: 0.1829\n",
      "06/02 04:30:32 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5340/6543]  lr: 1.7238e-05  eta: 0:11:10  time: 0.4922  data_time: 0.0080  memory: 11508  loss: 1.2720  grad_norm: 0.1829\n",
      "06/02 04:30:37 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5350/6543]  lr: 1.6961e-05  eta: 0:11:05  time: 0.4926  data_time: 0.0079  memory: 11508  loss: 1.2696  grad_norm: 0.1825\n",
      "06/02 04:30:42 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5360/6543]  lr: 1.6687e-05  eta: 0:10:59  time: 0.4928  data_time: 0.0079  memory: 11508  loss: 1.3236  grad_norm: 0.1808\n",
      "06/02 04:30:47 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5370/6543]  lr: 1.6414e-05  eta: 0:10:53  time: 0.4919  data_time: 0.0079  memory: 11508  loss: 1.2527  grad_norm: 0.1808\n",
      "06/02 04:30:52 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5380/6543]  lr: 1.6143e-05  eta: 0:10:48  time: 0.4926  data_time: 0.0078  memory: 11508  loss: 1.2848  grad_norm: 0.1806\n",
      "06/02 04:30:57 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5390/6543]  lr: 1.5875e-05  eta: 0:10:42  time: 0.4923  data_time: 0.0081  memory: 11508  loss: 1.2541  grad_norm: 0.1806\n",
      "06/02 04:31:02 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5400/6543]  lr: 1.5608e-05  eta: 0:10:36  time: 0.4926  data_time: 0.0079  memory: 11508  loss: 1.3011  grad_norm: 0.1809\n",
      "06/02 04:31:07 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5410/6543]  lr: 1.5344e-05  eta: 0:10:30  time: 0.4930  data_time: 0.0081  memory: 11508  loss: 1.3148  grad_norm: 0.1818\n",
      "06/02 04:31:12 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5420/6543]  lr: 1.5081e-05  eta: 0:10:25  time: 0.4924  data_time: 0.0079  memory: 11508  loss: 1.3389  grad_norm: 0.1818\n",
      "06/02 04:31:17 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5430/6543]  lr: 1.4821e-05  eta: 0:10:19  time: 0.4924  data_time: 0.0077  memory: 11508  loss: 1.2797  grad_norm: 0.1815\n",
      "06/02 04:31:22 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5440/6543]  lr: 1.4563e-05  eta: 0:10:13  time: 0.4927  data_time: 0.0079  memory: 11508  loss: 1.2839  grad_norm: 0.1801\n",
      "06/02 04:31:27 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5450/6543]  lr: 1.4306e-05  eta: 0:10:08  time: 0.4918  data_time: 0.0079  memory: 11508  loss: 1.2920  grad_norm: 0.1801\n",
      "06/02 04:31:32 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5460/6543]  lr: 1.4052e-05  eta: 0:10:02  time: 0.4928  data_time: 0.0080  memory: 11508  loss: 1.2979  grad_norm: 0.1802\n",
      "06/02 04:31:36 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5470/6543]  lr: 1.3800e-05  eta: 0:09:56  time: 0.4918  data_time: 0.0077  memory: 11508  loss: 1.2692  grad_norm: 0.1802\n",
      "06/02 04:31:41 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5480/6543]  lr: 1.3550e-05  eta: 0:09:51  time: 0.4928  data_time: 0.0080  memory: 11508  loss: 1.2926  grad_norm: 0.1795\n",
      "06/02 04:31:46 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5490/6543]  lr: 1.3303e-05  eta: 0:09:45  time: 0.4930  data_time: 0.0078  memory: 11508  loss: 1.3408  grad_norm: 0.1806\n",
      "06/02 04:31:51 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5500/6543]  lr: 1.3057e-05  eta: 0:09:39  time: 0.4918  data_time: 0.0078  memory: 11508  loss: 1.3243  grad_norm: 0.1806\n",
      "06/02 04:31:51 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - after_train_iter in EvaluateChatHook.\n",
      "06/02 04:31:53 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Sample output:\n",
      "<bos><start_of_turn>system\n",
      "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n",
      "<end_of_turn>\n",
      "<start_of_turn>user\n",
      "请给我介绍五个上海的景点<end_of_turn>\n",
      "<start_of_turn>model\n",
      "1. 上海故宫\n",
      "2. 上海博物馆\n",
      "3. 上海虹桥\n",
      "4. 上海天坛\n",
      "5. 上海东方体育场<end_of_turn>\n",
      "\n",
      "06/02 04:31:57 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Sample output:\n",
      "<bos><start_of_turn>system\n",
      "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n",
      "<end_of_turn>\n",
      "<start_of_turn>user\n",
      "Please tell me five scenic spots in Shanghai<end_of_turn>\n",
      "<start_of_turn>model\n",
      "Five scenic spots in Shanghai are: \n",
      "1. The Bund \n",
      "2. The Oriental Pearl Tower \n",
      "3. The Shanghai Tower \n",
      "4. The Yu Garden \n",
      "5. The Qing Shan Temple<end_of_turn>\n",
      "\n",
      "06/02 04:31:57 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Saving checkpoint at 5500 iterations\n",
      "06/02 04:32:04 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5510/6543]  lr: 1.2814e-05  eta: 0:09:35  time: 1.3036  data_time: 0.8160  memory: 11508  loss: 1.3019  grad_norm: 0.1810\n",
      "06/02 04:32:10 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5520/6543]  lr: 1.2572e-05  eta: 0:09:30  time: 0.5752  data_time: 0.0902  memory: 11508  loss: 1.2757  grad_norm: 0.1813\n",
      "06/02 04:32:15 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5530/6543]  lr: 1.2333e-05  eta: 0:09:24  time: 0.4921  data_time: 0.0079  memory: 11508  loss: 1.2503  grad_norm: 0.1813\n",
      "06/02 04:32:20 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5540/6543]  lr: 1.2096e-05  eta: 0:09:18  time: 0.4928  data_time: 0.0080  memory: 11508  loss: 1.2386  grad_norm: 0.1813\n",
      "06/02 04:32:25 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5550/6543]  lr: 1.1861e-05  eta: 0:09:13  time: 0.4920  data_time: 0.0079  memory: 11508  loss: 1.3399  grad_norm: 0.1813\n",
      "06/02 04:32:30 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5560/6543]  lr: 1.1628e-05  eta: 0:09:07  time: 0.4930  data_time: 0.0081  memory: 11508  loss: 1.3583  grad_norm: 0.1816\n",
      "06/02 04:32:35 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5570/6543]  lr: 1.1398e-05  eta: 0:09:01  time: 0.4924  data_time: 0.0078  memory: 11508  loss: 1.2887  grad_norm: 0.1824\n",
      "06/02 04:32:40 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5580/6543]  lr: 1.1169e-05  eta: 0:08:55  time: 0.4918  data_time: 0.0078  memory: 11508  loss: 1.3322  grad_norm: 0.1824\n",
      "06/02 04:32:44 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5590/6543]  lr: 1.0943e-05  eta: 0:08:50  time: 0.4926  data_time: 0.0080  memory: 11508  loss: 1.2891  grad_norm: 0.1839\n",
      "06/02 04:32:49 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5600/6543]  lr: 1.0719e-05  eta: 0:08:44  time: 0.4927  data_time: 0.0078  memory: 11508  loss: 1.2914  grad_norm: 0.1850\n",
      "06/02 04:32:54 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5610/6543]  lr: 1.0497e-05  eta: 0:08:38  time: 0.4922  data_time: 0.0080  memory: 11508  loss: 1.3228  grad_norm: 0.1850\n",
      "06/02 04:32:59 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5620/6543]  lr: 1.0278e-05  eta: 0:08:33  time: 0.4924  data_time: 0.0078  memory: 11508  loss: 1.3421  grad_norm: 0.1865\n",
      "06/02 04:33:04 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5630/6543]  lr: 1.0060e-05  eta: 0:08:27  time: 0.4922  data_time: 0.0080  memory: 11508  loss: 1.2123  grad_norm: 0.1865\n",
      "06/02 04:33:09 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5640/6543]  lr: 9.8449e-06  eta: 0:08:21  time: 0.4928  data_time: 0.0080  memory: 11508  loss: 1.2956  grad_norm: 0.1868\n",
      "06/02 04:33:14 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5650/6543]  lr: 9.6318e-06  eta: 0:08:16  time: 0.4925  data_time: 0.0078  memory: 11508  loss: 1.3147  grad_norm: 0.1870\n",
      "06/02 04:33:19 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5660/6543]  lr: 9.4210e-06  eta: 0:08:10  time: 0.4922  data_time: 0.0080  memory: 11508  loss: 1.2696  grad_norm: 0.1870\n",
      "06/02 04:33:24 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5670/6543]  lr: 9.2124e-06  eta: 0:08:04  time: 0.4925  data_time: 0.0079  memory: 11508  loss: 1.3162  grad_norm: 0.1880\n",
      "06/02 04:33:29 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5680/6543]  lr: 9.0060e-06  eta: 0:07:59  time: 0.4931  data_time: 0.0081  memory: 11508  loss: 1.3064  grad_norm: 0.1880\n",
      "06/02 04:33:34 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5690/6543]  lr: 8.8018e-06  eta: 0:07:53  time: 0.4916  data_time: 0.0078  memory: 11508  loss: 1.3546  grad_norm: 0.1880\n",
      "06/02 04:33:39 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5700/6543]  lr: 8.5999e-06  eta: 0:07:48  time: 0.4925  data_time: 0.0078  memory: 11508  loss: 1.2898  grad_norm: 0.1886\n",
      "06/02 04:33:44 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5710/6543]  lr: 8.4002e-06  eta: 0:07:42  time: 0.4920  data_time: 0.0079  memory: 11508  loss: 1.3272  grad_norm: 0.1886\n",
      "06/02 04:33:48 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5720/6543]  lr: 8.2027e-06  eta: 0:07:36  time: 0.4924  data_time: 0.0078  memory: 11508  loss: 1.2527  grad_norm: 0.1890\n",
      "06/02 04:33:53 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5730/6543]  lr: 8.0075e-06  eta: 0:07:31  time: 0.4928  data_time: 0.0080  memory: 11508  loss: 1.3297  grad_norm: 0.1880\n",
      "06/02 04:33:58 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5740/6543]  lr: 7.8146e-06  eta: 0:07:25  time: 0.4919  data_time: 0.0078  memory: 11508  loss: 1.2947  grad_norm: 0.1880\n",
      "06/02 04:34:03 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5750/6543]  lr: 7.6239e-06  eta: 0:07:19  time: 0.4933  data_time: 0.0083  memory: 11508  loss: 1.3871  grad_norm: 0.1869\n",
      "06/02 04:34:08 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5760/6543]  lr: 7.4355e-06  eta: 0:07:14  time: 0.4927  data_time: 0.0079  memory: 11508  loss: 1.2515  grad_norm: 0.1863\n",
      "06/02 04:34:13 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5770/6543]  lr: 7.2493e-06  eta: 0:07:08  time: 0.4918  data_time: 0.0078  memory: 11508  loss: 1.3188  grad_norm: 0.1863\n",
      "06/02 04:34:18 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5780/6543]  lr: 7.0654e-06  eta: 0:07:02  time: 0.4929  data_time: 0.0080  memory: 11508  loss: 1.2369  grad_norm: 0.1845\n",
      "06/02 04:34:23 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5790/6543]  lr: 6.8838e-06  eta: 0:06:57  time: 0.4921  data_time: 0.0081  memory: 11508  loss: 1.3277  grad_norm: 0.1845\n",
      "06/02 04:34:28 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5800/6543]  lr: 6.7045e-06  eta: 0:06:51  time: 0.4928  data_time: 0.0081  memory: 11508  loss: 1.2600  grad_norm: 0.1850\n",
      "06/02 04:34:33 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5810/6543]  lr: 6.5274e-06  eta: 0:06:46  time: 0.4925  data_time: 0.0079  memory: 11508  loss: 1.2300  grad_norm: 0.1841\n",
      "06/02 04:34:38 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5820/6543]  lr: 6.3527e-06  eta: 0:06:40  time: 0.4920  data_time: 0.0079  memory: 11508  loss: 1.2350  grad_norm: 0.1841\n",
      "06/02 04:34:43 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5830/6543]  lr: 6.1802e-06  eta: 0:06:34  time: 0.4926  data_time: 0.0079  memory: 11508  loss: 1.2927  grad_norm: 0.1833\n",
      "06/02 04:34:48 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5840/6543]  lr: 6.0101e-06  eta: 0:06:29  time: 0.4927  data_time: 0.0078  memory: 11508  loss: 1.3093  grad_norm: 0.1837\n",
      "06/02 04:34:53 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5850/6543]  lr: 5.8422e-06  eta: 0:06:23  time: 0.4923  data_time: 0.0080  memory: 11508  loss: 1.3213  grad_norm: 0.1837\n",
      "06/02 04:34:57 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5860/6543]  lr: 5.6767e-06  eta: 0:06:18  time: 0.4927  data_time: 0.0079  memory: 11508  loss: 1.3374  grad_norm: 0.1831\n",
      "06/02 04:35:02 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5870/6543]  lr: 5.5134e-06  eta: 0:06:12  time: 0.4925  data_time: 0.0080  memory: 11508  loss: 1.2553  grad_norm: 0.1831\n",
      "06/02 04:35:07 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5880/6543]  lr: 5.3525e-06  eta: 0:06:06  time: 0.4926  data_time: 0.0079  memory: 11508  loss: 1.2275  grad_norm: 0.1833\n",
      "06/02 04:35:12 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5890/6543]  lr: 5.1939e-06  eta: 0:06:01  time: 0.4926  data_time: 0.0079  memory: 11508  loss: 1.3172  grad_norm: 0.1827\n",
      "06/02 04:35:17 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5900/6543]  lr: 5.0376e-06  eta: 0:05:55  time: 0.4920  data_time: 0.0079  memory: 11508  loss: 1.3296  grad_norm: 0.1827\n",
      "06/02 04:35:22 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5910/6543]  lr: 4.8836e-06  eta: 0:05:50  time: 0.4925  data_time: 0.0078  memory: 11508  loss: 1.2630  grad_norm: 0.1842\n",
      "06/02 04:35:27 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5920/6543]  lr: 4.7320e-06  eta: 0:05:44  time: 0.4929  data_time: 0.0080  memory: 11508  loss: 1.2410  grad_norm: 0.1852\n",
      "06/02 04:35:32 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5930/6543]  lr: 4.5827e-06  eta: 0:05:38  time: 0.4917  data_time: 0.0078  memory: 11508  loss: 1.2854  grad_norm: 0.1852\n",
      "06/02 04:35:37 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5940/6543]  lr: 4.4358e-06  eta: 0:05:33  time: 0.4925  data_time: 0.0079  memory: 11508  loss: 1.3345  grad_norm: 0.1859\n",
      "06/02 04:35:42 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5950/6543]  lr: 4.2912e-06  eta: 0:05:27  time: 0.4921  data_time: 0.0080  memory: 11508  loss: 1.2459  grad_norm: 0.1859\n",
      "06/02 04:35:47 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5960/6543]  lr: 4.1489e-06  eta: 0:05:22  time: 0.4925  data_time: 0.0078  memory: 11508  loss: 1.2589  grad_norm: 0.1851\n",
      "06/02 04:35:52 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5970/6543]  lr: 4.0090e-06  eta: 0:05:16  time: 0.4926  data_time: 0.0079  memory: 11508  loss: 1.2910  grad_norm: 0.1854\n",
      "06/02 04:35:57 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5980/6543]  lr: 3.8714e-06  eta: 0:05:10  time: 0.4923  data_time: 0.0079  memory: 11508  loss: 1.2619  grad_norm: 0.1854\n",
      "06/02 04:36:01 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [5990/6543]  lr: 3.7362e-06  eta: 0:05:05  time: 0.4931  data_time: 0.0080  memory: 11508  loss: 1.2849  grad_norm: 0.1853\n",
      "06/02 04:36:06 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Exp name: gemma_2b_it_qlora_alpaca_e3_20240602_034031\n",
      "06/02 04:36:06 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6000/6543]  lr: 3.6033e-06  eta: 0:04:59  time: 0.4930  data_time: 0.0081  memory: 11508  loss: 1.2561  grad_norm: 0.1855\n",
      "06/02 04:36:06 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - after_train_iter in EvaluateChatHook.\n",
      "06/02 04:36:50 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Sample output:\n",
      "<bos><start_of_turn>system\n",
      "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n",
      "<end_of_turn>\n",
      "<start_of_turn>user\n",
      "请给我介绍五个上海的景点<end_of_turn>\n",
      "<start_of_turn>model\n",
      "1. 上海故宫\n",
      "2. 上海博物馆\n",
      "3. 上海虹桥\n",
      "4. 上海天坛\n",
      "5. 上海体育场 \n",
      "6. 上海科学技术馆\n",
      "7. 上海博物馆\n",
      "8. 上海故宫\n",
      "9. 上海博物馆\n",
      "10. 上海虹桥 \n",
      "11. 上海天坛\n",
      "12. 上海体育场\n",
      "13. 上海科学技术馆\n",
      "14. 上海博物馆\n",
      "15. 上海故宫\n",
      "16. 上海博物馆\n",
      "17. 上海体育场\n",
      "18. 上海博物馆\n",
      "19. 上海故宫\n",
      "20. 上海博物馆\n",
      "21. 上海体育场\n",
      "22. 上海博物馆\n",
      "23. 上海故宫\n",
      "24. 上海博物馆\n",
      "25. 上海体育场\n",
      "26. 上海博物馆\n",
      "27. 上海故宫\n",
      "28. 上海博物馆\n",
      "29. 上海体育场\n",
      "30. 上海博物馆\n",
      "31. 上海故宫\n",
      "32. 上海博物馆\n",
      "33. 上海体育场\n",
      "34. 上海博物馆\n",
      "35. 上海故宫\n",
      "36. 上海博物馆\n",
      "37. 上海体育场\n",
      "38. 上海博物馆\n",
      "39. 上海故宫\n",
      "40. 上海博物馆\n",
      "41. 上海故宫\n",
      "42. 上海博物馆\n",
      "43. 上海故宫\n",
      "44. 上海博物馆\n",
      "45. 上海故宫\n",
      "46. 上海博物馆\n",
      "47. 上海故宫\n",
      "48. 上海博物馆\n",
      "49. 上海故宫\n",
      "50. 上海博物馆\n",
      "51. 上海故宫\n",
      "52. 上海博物馆\n",
      "53. 上海故宫\n",
      "54. 上海博物馆\n",
      "55. 上海故宫\n",
      "56. 上海博物馆\n",
      "57. 上海故宫\n",
      "58. 上海博物馆\n",
      "59. 上海故宫\n",
      "60. 上海博物馆\n",
      "61. 上海故宫\n",
      "62. 上海博物馆\n",
      "63. 上海故宫\n",
      "64. 上海博物馆\n",
      "65. 上海故宫\n",
      "66. 上海博物馆\n",
      "67. 上海博物馆\n",
      "68. 上海故宫\n",
      "69. 上海博物馆\n",
      "70. 上海故宫\n",
      "71. 上海博物馆\n",
      "72. 上海博物馆\n",
      "73. 上海故宫\n",
      "74. 上海博物馆\n",
      "75. 上海博物馆\n",
      "76. 上海故宫\n",
      "77. 上海博物馆\n",
      "78. 上海博物馆\n",
      "79. 上海博物馆\n",
      "80. 上海博物馆\n",
      "81. 上海博物馆\n",
      "82. 上海博物馆\n",
      "83. 上海博物馆\n",
      "84. 上海博物馆\n",
      "85. 上海博物馆\n",
      "86. 上海博物馆\n",
      "87. 上海博物馆\n",
      "88. 上海博物馆\n",
      "89. 上海博物馆\n",
      "90. 上海博物馆\n",
      "91. 上海博物馆\n",
      "92. 上海博物馆\n",
      "93. 上海博物馆\n",
      "94. 上海博物馆\n",
      "9\n",
      "\n",
      "06/02 04:36:53 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Sample output:\n",
      "<bos><start_of_turn>system\n",
      "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n",
      "<end_of_turn>\n",
      "<start_of_turn>user\n",
      "Please tell me five scenic spots in Shanghai<end_of_turn>\n",
      "<start_of_turn>model\n",
      "Five scenic spots in Shanghai are the Bund, the Oriental Pearl Tower, the Yu Garden, the Shanghai Disneyland, and the Qing Shan Temple.<end_of_turn>\n",
      "\n",
      "06/02 04:36:53 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Saving checkpoint at 6000 iterations\n",
      "06/02 04:37:00 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6010/6543]  lr: 3.4728e-06  eta: 0:04:58  time: 5.3321  data_time: 4.8460  memory: 11508  loss: 1.2958  grad_norm: 0.1855\n",
      "06/02 04:37:05 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6020/6543]  lr: 3.3447e-06  eta: 0:04:52  time: 0.4927  data_time: 0.0081  memory: 11508  loss: 1.2541  grad_norm: 0.1860\n",
      "06/02 04:37:10 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6030/6543]  lr: 3.2190e-06  eta: 0:04:47  time: 0.4918  data_time: 0.0080  memory: 11508  loss: 1.3132  grad_norm: 0.1860\n",
      "06/02 04:37:14 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6040/6543]  lr: 3.0956e-06  eta: 0:04:41  time: 0.4927  data_time: 0.0080  memory: 11508  loss: 1.2753  grad_norm: 0.1867\n",
      "06/02 04:37:19 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6050/6543]  lr: 2.9746e-06  eta: 0:04:35  time: 0.4925  data_time: 0.0079  memory: 11508  loss: 1.2301  grad_norm: 0.1882\n",
      "06/02 04:37:24 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6060/6543]  lr: 2.8559e-06  eta: 0:04:30  time: 0.4924  data_time: 0.0081  memory: 11508  loss: 1.2543  grad_norm: 0.1882\n",
      "06/02 04:37:29 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6070/6543]  lr: 2.7397e-06  eta: 0:04:24  time: 0.4926  data_time: 0.0080  memory: 11508  loss: 1.3252  grad_norm: 0.1866\n",
      "06/02 04:37:34 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6080/6543]  lr: 2.6258e-06  eta: 0:04:18  time: 0.4926  data_time: 0.0079  memory: 11508  loss: 1.3329  grad_norm: 0.1853\n",
      "06/02 04:37:39 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6090/6543]  lr: 2.5143e-06  eta: 0:04:13  time: 0.4918  data_time: 0.0080  memory: 11508  loss: 1.3638  grad_norm: 0.1853\n",
      "06/02 04:37:44 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6100/6543]  lr: 2.4052e-06  eta: 0:04:07  time: 0.4929  data_time: 0.0081  memory: 11508  loss: 1.2429  grad_norm: 0.1852\n",
      "06/02 04:37:49 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6110/6543]  lr: 2.2985e-06  eta: 0:04:02  time: 0.4920  data_time: 0.0080  memory: 11508  loss: 1.2710  grad_norm: 0.1852\n",
      "06/02 04:37:54 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6120/6543]  lr: 2.1942e-06  eta: 0:03:56  time: 0.4929  data_time: 0.0080  memory: 11508  loss: 1.2278  grad_norm: 0.1861\n",
      "06/02 04:37:59 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6130/6543]  lr: 2.0922e-06  eta: 0:03:50  time: 0.4929  data_time: 0.0080  memory: 11508  loss: 1.2901  grad_norm: 0.1860\n",
      "06/02 04:38:04 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6140/6543]  lr: 1.9927e-06  eta: 0:03:45  time: 0.4922  data_time: 0.0081  memory: 11508  loss: 1.2118  grad_norm: 0.1860\n",
      "06/02 04:38:09 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6150/6543]  lr: 1.8956e-06  eta: 0:03:39  time: 0.4926  data_time: 0.0079  memory: 11508  loss: 1.3340  grad_norm: 0.1877\n",
      "06/02 04:38:14 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6160/6543]  lr: 1.8009e-06  eta: 0:03:33  time: 0.4927  data_time: 0.0080  memory: 11508  loss: 1.2814  grad_norm: 0.1875\n",
      "06/02 04:38:19 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6170/6543]  lr: 1.7086e-06  eta: 0:03:28  time: 0.4918  data_time: 0.0079  memory: 11508  loss: 1.2986  grad_norm: 0.1875\n",
      "06/02 04:38:23 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6180/6543]  lr: 1.6187e-06  eta: 0:03:22  time: 0.4931  data_time: 0.0082  memory: 11508  loss: 1.3181  grad_norm: 0.1869\n",
      "06/02 04:38:28 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6190/6543]  lr: 1.5312e-06  eta: 0:03:16  time: 0.4921  data_time: 0.0080  memory: 11508  loss: 1.2648  grad_norm: 0.1869\n",
      "06/02 04:38:33 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6200/6543]  lr: 1.4461e-06  eta: 0:03:11  time: 0.4929  data_time: 0.0080  memory: 11508  loss: 1.3560  grad_norm: 0.1864\n",
      "06/02 04:38:38 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6210/6543]  lr: 1.3634e-06  eta: 0:03:05  time: 0.4925  data_time: 0.0079  memory: 11508  loss: 1.2562  grad_norm: 0.1856\n",
      "06/02 04:38:43 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6220/6543]  lr: 1.2832e-06  eta: 0:03:00  time: 0.4919  data_time: 0.0079  memory: 11508  loss: 1.1752  grad_norm: 0.1856\n",
      "06/02 04:38:48 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6230/6543]  lr: 1.2054e-06  eta: 0:02:54  time: 0.4927  data_time: 0.0080  memory: 11508  loss: 1.2900  grad_norm: 0.1854\n",
      "06/02 04:38:53 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6240/6543]  lr: 1.1300e-06  eta: 0:02:48  time: 0.4929  data_time: 0.0080  memory: 11508  loss: 1.3011  grad_norm: 0.1857\n",
      "06/02 04:38:58 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6250/6543]  lr: 1.0570e-06  eta: 0:02:43  time: 0.4921  data_time: 0.0081  memory: 11508  loss: 1.2892  grad_norm: 0.1857\n",
      "06/02 04:39:03 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6260/6543]  lr: 9.8640e-07  eta: 0:02:37  time: 0.4930  data_time: 0.0080  memory: 11508  loss: 1.2576  grad_norm: 0.1855\n",
      "06/02 04:39:08 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6270/6543]  lr: 9.1827e-07  eta: 0:02:32  time: 0.4922  data_time: 0.0080  memory: 11508  loss: 1.2803  grad_norm: 0.1855\n",
      "06/02 04:39:13 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6280/6543]  lr: 8.5256e-07  eta: 0:02:26  time: 0.4935  data_time: 0.0080  memory: 11508  loss: 1.2920  grad_norm: 0.1851\n",
      "06/02 04:39:18 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6290/6543]  lr: 7.8927e-07  eta: 0:02:20  time: 0.4927  data_time: 0.0080  memory: 11508  loss: 1.2763  grad_norm: 0.1850\n",
      "06/02 04:39:23 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6300/6543]  lr: 7.2842e-07  eta: 0:02:15  time: 0.4921  data_time: 0.0081  memory: 11508  loss: 1.3185  grad_norm: 0.1850\n",
      "06/02 04:39:27 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6310/6543]  lr: 6.7001e-07  eta: 0:02:09  time: 0.4929  data_time: 0.0081  memory: 11508  loss: 1.3256  grad_norm: 0.1841\n",
      "06/02 04:39:32 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6320/6543]  lr: 6.1402e-07  eta: 0:02:04  time: 0.4929  data_time: 0.0081  memory: 11508  loss: 1.3195  grad_norm: 0.1849\n",
      "06/02 04:39:37 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6330/6543]  lr: 5.6047e-07  eta: 0:01:58  time: 0.4918  data_time: 0.0079  memory: 11508  loss: 1.3087  grad_norm: 0.1849\n",
      "06/02 04:39:42 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6340/6543]  lr: 5.0936e-07  eta: 0:01:52  time: 0.4928  data_time: 0.0080  memory: 11508  loss: 1.2418  grad_norm: 0.1843\n",
      "06/02 04:39:47 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6350/6543]  lr: 4.6068e-07  eta: 0:01:47  time: 0.4921  data_time: 0.0080  memory: 11508  loss: 1.2665  grad_norm: 0.1843\n",
      "06/02 04:39:52 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6360/6543]  lr: 4.1445e-07  eta: 0:01:41  time: 0.4927  data_time: 0.0079  memory: 11508  loss: 1.3181  grad_norm: 0.1837\n",
      "06/02 04:39:57 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6370/6543]  lr: 3.7065e-07  eta: 0:01:36  time: 0.4928  data_time: 0.0081  memory: 11508  loss: 1.3073  grad_norm: 0.1834\n",
      "06/02 04:40:02 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6380/6543]  lr: 3.2929e-07  eta: 0:01:30  time: 0.4921  data_time: 0.0080  memory: 11508  loss: 1.2653  grad_norm: 0.1834\n",
      "06/02 04:40:07 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6390/6543]  lr: 2.9038e-07  eta: 0:01:25  time: 0.4931  data_time: 0.0081  memory: 11508  loss: 1.3081  grad_norm: 0.1838\n",
      "06/02 04:40:12 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6400/6543]  lr: 2.5391e-07  eta: 0:01:19  time: 0.4928  data_time: 0.0080  memory: 11508  loss: 1.3498  grad_norm: 0.1838\n",
      "06/02 04:40:17 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6410/6543]  lr: 2.1988e-07  eta: 0:01:13  time: 0.4923  data_time: 0.0080  memory: 11508  loss: 1.3114  grad_norm: 0.1838\n",
      "06/02 04:40:22 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6420/6543]  lr: 1.8830e-07  eta: 0:01:08  time: 0.4934  data_time: 0.0085  memory: 11508  loss: 1.2968  grad_norm: 0.1841\n",
      "06/02 04:40:27 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6430/6543]  lr: 1.5916e-07  eta: 0:01:02  time: 0.4920  data_time: 0.0079  memory: 11508  loss: 1.3312  grad_norm: 0.1841\n",
      "06/02 04:40:32 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6440/6543]  lr: 1.3247e-07  eta: 0:00:57  time: 0.4930  data_time: 0.0081  memory: 11508  loss: 1.2606  grad_norm: 0.1846\n",
      "06/02 04:40:36 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6450/6543]  lr: 1.0822e-07  eta: 0:00:51  time: 0.4925  data_time: 0.0079  memory: 11508  loss: 1.3081  grad_norm: 0.1840\n",
      "06/02 04:40:41 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6460/6543]  lr: 8.6423e-08  eta: 0:00:46  time: 0.4921  data_time: 0.0081  memory: 11508  loss: 1.2634  grad_norm: 0.1840\n",
      "06/02 04:40:46 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6470/6543]  lr: 6.7073e-08  eta: 0:00:40  time: 0.4926  data_time: 0.0079  memory: 11508  loss: 1.2690  grad_norm: 0.1844\n",
      "06/02 04:40:51 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6480/6543]  lr: 5.0171e-08  eta: 0:00:34  time: 0.4927  data_time: 0.0078  memory: 11508  loss: 1.3570  grad_norm: 0.1848\n",
      "06/02 04:40:56 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6490/6543]  lr: 3.5719e-08  eta: 0:00:29  time: 0.4922  data_time: 0.0082  memory: 11508  loss: 1.2549  grad_norm: 0.1848\n",
      "06/02 04:41:01 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6500/6543]  lr: 2.3715e-08  eta: 0:00:23  time: 0.4926  data_time: 0.0079  memory: 11508  loss: 1.2488  grad_norm: 0.1856\n",
      "06/02 04:41:01 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - after_train_iter in EvaluateChatHook.\n",
      "06/02 04:41:03 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Sample output:\n",
      "<bos><start_of_turn>system\n",
      "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n",
      "<end_of_turn>\n",
      "<start_of_turn>user\n",
      "请给我介绍五个上海的景点<end_of_turn>\n",
      "<start_of_turn>model\n",
      "1. 上海故宫\n",
      "2. 上海博物馆\n",
      "3. 上海虹桥\n",
      "4. 上海天坛\n",
      "5. 上海南山公园<end_of_turn>\n",
      "\n",
      "06/02 04:41:06 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Sample output:\n",
      "<bos><start_of_turn>system\n",
      "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n",
      "<end_of_turn>\n",
      "<start_of_turn>user\n",
      "Please tell me five scenic spots in Shanghai<end_of_turn>\n",
      "<start_of_turn>model\n",
      "Five scenic spots in Shanghai are the Bund, the Oriental Pearl Tower, the Yu Garden, the Shanghai Disneyland, and the Qing Shan Temple.<end_of_turn>\n",
      "\n",
      "06/02 04:41:06 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Saving checkpoint at 6500 iterations\n",
      "06/02 04:41:13 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6510/6543]  lr: 1.4161e-08  eta: 0:00:18  time: 1.1633  data_time: 0.6764  memory: 11508  loss: 1.2668  grad_norm: 0.1856\n",
      "06/02 04:41:18 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6520/6543]  lr: 7.0559e-09  eta: 0:00:12  time: 0.4930  data_time: 0.0081  memory: 11508  loss: 1.3502  grad_norm: 0.1865\n",
      "06/02 04:41:23 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6530/6543]  lr: 2.4010e-09  eta: 0:00:07  time: 0.4926  data_time: 0.0080  memory: 11508  loss: 1.2928  grad_norm: 0.1867\n",
      "06/02 04:41:27 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Iter(train) [6540/6543]  lr: 1.9600e-10  eta: 0:00:01  time: 0.4919  data_time: 0.0079  memory: 11508  loss: 1.2958  grad_norm: 0.1867\n",
      "06/02 04:41:29 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - after_train_iter in EvaluateChatHook.\n",
      "06/02 04:41:31 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Sample output:\n",
      "<bos><start_of_turn>system\n",
      "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n",
      "<end_of_turn>\n",
      "<start_of_turn>user\n",
      "请给我介绍五个上海的景点<end_of_turn>\n",
      "<start_of_turn>model\n",
      "1. 上海故宫\n",
      "2. 上海博物馆\n",
      "3. 上海虹桥\n",
      "4. 上海天坛\n",
      "5. 上海东方体育场<end_of_turn>\n",
      "\n",
      "06/02 04:41:33 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Sample output:\n",
      "<bos><start_of_turn>system\n",
      "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n",
      "<end_of_turn>\n",
      "<start_of_turn>user\n",
      "Please tell me five scenic spots in Shanghai<end_of_turn>\n",
      "<start_of_turn>model\n",
      "Five scenic spots in Shanghai are the Bund, the Oriental Pearl Tower, the Yu Garden, the Shanghai Disneyland, and the Qing Shan Temple.<end_of_turn>\n",
      "\n",
      "06/02 04:41:33 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Saving checkpoint at 6543 iterations\n",
      "06/02 04:41:36 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - after_train in EvaluateChatHook.\n",
      "06/02 04:41:38 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Sample output:\n",
      "<bos><start_of_turn>system\n",
      "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n",
      "<end_of_turn>\n",
      "<start_of_turn>user\n",
      "请给我介绍五个上海的景点<end_of_turn>\n",
      "<start_of_turn>model\n",
      "1. 上海故宫\n",
      "2. 上海博物馆\n",
      "3. 上海虹桥\n",
      "4. 上海天坛\n",
      "5. 上海东方体育场<end_of_turn>\n",
      "\n",
      "06/02 04:41:40 - mmengine - \u001b[4m\u001b[97mINFO\u001b[0m - Sample output:\n",
      "<bos><start_of_turn>system\n",
      "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n",
      "<end_of_turn>\n",
      "<start_of_turn>user\n",
      "Please tell me five scenic spots in Shanghai<end_of_turn>\n",
      "<start_of_turn>model\n",
      "Five scenic spots in Shanghai are the Bund, the Oriental Pearl Tower, the Yu Garden, the Shanghai Disneyland, and the Qing Shan Temple.<end_of_turn>\n",
      "\n"
     ]
    }
   ]
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Convert to Hugging Face"
   ],
   "metadata": {
    "id": "SfH0JfEIfYls"
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "Create a folder for storing converted HF model."
   ],
   "metadata": {
    "id": "hEPS9dIC4RND"
   }
  },
  {
   "cell_type": "code",
   "source": [
    "!mkdir -p work_dirs/gemma_2b_it_qlora_alpaca_e3_hf"
   ],
   "metadata": {
    "id": "huTH59BXiw4o"
   },
   "execution_count": 12,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "source": [
    "Conver the LoRA adapter to HF."
   ],
   "metadata": {
    "id": "Bf4A7zIG4Y3j"
   }
  },
  {
   "cell_type": "code",
   "source": [
    "!xtuner convert pth_to_hf gemma_2b_it_qlora_alpaca_e3 work_dirs/gemma_2b_it_qlora_alpaca_e3/iter_6500.pth work_dirs/gemma_2b_it_qlora_alpaca_e3_hf"
   ],
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "mc4aY2jqcoMM",
    "outputId": "2f38299b-dc08-4b30-deaa-88e93b3869ed"
   },
   "execution_count": 13,
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.\n",
      "  warnings.warn(\n",
      "quantization_config convert to <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>\n",
      "`low_cpu_mem_usage` was None, now set to True since model is quantized.\n",
      "`config.hidden_act` is ignored, you should use `config.hidden_activation` instead.\n",
      "Gemma's activation function will be set to `gelu_pytorch_tanh`. Please, use\n",
      "`config.hidden_activation` if you want to override this behaviour.\n",
      "See https://github.com/huggingface/transformers/pull/29402 for more details.\n",
      "Loading checkpoint shards: 100% 2/2 [00:03<00:00,  1.57s/it]\n",
      "Load PTH model from work_dirs/gemma_2b_it_qlora_alpaca_e3/iter_6500.pth\n",
      "Saving adapter to work_dirs/gemma_2b_it_qlora_alpaca_e3_hf\n",
      "Convert LLM to float16\n",
      "All done!\n"
     ]
    }
   ]
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Merge LoRA adapter"
   ],
   "metadata": {
    "id": "b4ISL1YhjxpD"
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "Create a folder for storing the merged model."
   ],
   "metadata": {
    "id": "Wl2R96lX4eWy"
   }
  },
  {
   "cell_type": "code",
   "source": [
    "!mkdir -p work_dirs/gemma_2b_it_qlora_alpaca_e3_merged"
   ],
   "metadata": {
    "id": "ctqJcv0hj0aX"
   },
   "execution_count": 14,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "source": [
    "Merge the model and the LoRA adapter."
   ],
   "metadata": {
    "id": "SbPLcJUA4iaF"
   }
  },
  {
   "cell_type": "code",
   "source": [
    "!xtuner convert merge google/gemma-2b-it work_dirs/gemma_2b_it_qlora_alpaca_e3_hf work_dirs/gemma_2b_it_qlora_alpaca_e3_merged --max-shard-size 2GB"
   ],
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "iHlMdVWykCXN",
    "outputId": "3b774416-2847-4545-f0f0-936a32d4c22b"
   },
   "execution_count": 15,
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "2024-06-02 04:42:02.310283: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.\n",
      "2024-06-02 04:42:02.367089: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
      "2024-06-02 04:42:02.367144: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
      "2024-06-02 04:42:02.369043: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n",
      "2024-06-02 04:42:02.377440: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n",
      "To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n",
      "2024-06-02 04:42:03.397550: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n",
      "2024-06-02 04:42:09.253437: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
      "2024-06-02 04:42:09.253480: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
      "2024-06-02 04:42:09.254641: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n",
      "2024-06-02 04:42:10.256475: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n",
      "`config.hidden_act` is ignored, you should use `config.hidden_activation` instead.\n",
      "Gemma's activation function will be set to `gelu_pytorch_tanh`. Please, use\n",
      "`config.hidden_activation` if you want to override this behaviour.\n",
      "See https://github.com/huggingface/transformers/pull/29402 for more details.\n",
      "Loading checkpoint shards: 100% 2/2 [00:03<00:00,  1.56s/it]\n",
      "Saving to work_dirs/gemma_2b_it_qlora_alpaca_e3_merged...\n",
      "All done!\n"
     ]
    }
   ]
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Upload model to Hugging Face"
   ],
   "metadata": {
    "id": "oeZUSRbHhbV2"
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "Load the model from disk."
   ],
   "metadata": {
    "id": "ur5PRpAR4IzJ"
   }
  },
  {
   "cell_type": "code",
   "source": [
    "from transformers import AutoModel\n",
    "\n",
    "model = AutoModel.from_pretrained(\n",
    "    \"work_dirs/gemma_2b_it_qlora_alpaca_e3_merged\", local_files_only=True\n",
    ")"
   ],
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 49,
     "referenced_widgets": [
      "28bd715d5b86416f94b948b289cf65a4",
      "9d043e76f08248e09ca3759f86be2c5d",
      "1012c0b13725469bb4e73b95f3e4b98b",
      "7b1f895c17f5458bb2893f2fc9750e10",
      "3bee3e7e8cd24fb7b35c9a38f5c6af87",
      "2ea7b49489a2499db681ab07487ec90c",
      "5b0431b5435347708c476916ab7e71e9",
      "a07543f015fd4caeb69751c583ef7efe",
      "8ed186c45d494416beb846deaca8a1c6",
      "f0c85660507b49ad89897942cbeb309e",
      "a31aea533678414daf932d5c642ea28b"
     ]
    },
    "id": "hISor3IolTYB",
    "outputId": "b2389cb4-5806-40ea-f772-80761ff70038"
   },
   "execution_count": 16,
   "outputs": [
    {
     "output_type": "display_data",
     "data": {
      "text/plain": [
       "Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]"
      ],
      "application/vnd.jupyter.widget-view+json": {
       "version_major": 2,
       "version_minor": 0,
       "model_id": "28bd715d5b86416f94b948b289cf65a4"
      }
     },
     "metadata": {}
    }
   ]
  },
  {
   "cell_type": "markdown",
   "source": [
    "Push the model to HF Hub."
   ],
   "metadata": {
    "id": "Js4VK8pi4MGq"
   }
  },
  {
   "cell_type": "code",
   "source": [
    "model.push_to_hub(\"gemma-2-finetuned-model-xtuner\")"
   ],
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 181,
     "referenced_widgets": [
      "6679c7b5c0d740789f4e788e05c226b1",
      "029a74ee07854f58a756ef1ffdbbb44a",
      "cf1d908c37e840c8a184c7a2beea21c7",
      "70e754ce7e724242b97e0981ab83d504",
      "493ec5486f4743b8a3306fbc88a1fdb2",
      "615d0bcf2dcb46118e9e0832c9f56c48",
      "6f19265460464361a394254ecaf7040a",
      "e61f494a85644d74a76ea0495a6b03d0",
      "bff8375e4261431fad75e284d71de513",
      "e1873b49271b498c9eb0ac72a217117f",
      "745f3bd1ffdf463a8f188c170c902f3d",
      "5d190d0a6b82450ea44af79a2a0bd281",
      "1d285611daa4489494f70d8f39d0ea6f",
      "4b3d654744d24514b03e0cdea8424404",
      "ffd08b90599c4460bc24990c0cb08759",
      "2706a19741d24ddf9954dd09501661db",
      "8fc9ffd59440469e9e6e457beb99f7f8",
      "7237a15df203473a826c3046542a2de4",
      "0c3bf85907ea468599d8def54f7b8824",
      "be283a5eb29b4e7899a0007c31c9fbce",
      "f0710f43b0fd4fd9b2441130236b94a5",
      "ced7492573274d9980517f71da4deb53",
      "a49246e53bd24c1a9f7cc6cd29422d27",
      "709426f862f540fea008aea04d807aa3",
      "d3f29a6b96cf4a609055f6f442c9a072",
      "2ccdd27f81074840ac9370f3da745bb3",
      "651b5ce3e22d408c8ec05178ad223a55",
      "6fe591f841c549dcaecae4c8bb113393",
      "ebe7ba399b9744fcaebd2bf1b6e1a4ad",
      "c23ac33e5f0547d685eaa76eab4227bf",
      "e6fe7fb90179412fa06376a028d37359",
      "978e5e05d9964df48a842a5035065a28",
      "a6fa33b016214472b2bc08d9ed29a6cd",
      "103d2f47bd3041dfadfe51c4efbde30d",
      "28ded162c1af4358b72ec6bbb11eaa4b",
      "3f1c07c695e34c22b8a4fe812be01e5c",
      "c52cec21832946e889cca5062248ae07",
      "68438ea2108f41d58ff03c0cdf0c32a3",
      "3b20ea6dddf8469f92ffb78fc9c7c9c9",
      "0fdbd39d405947e9800038e9b8cd677d",
      "a0054d9a3f414faa8f8164bb73085c8f",
      "6769c74148654cab8897d8bb8a1debc2",
      "5aca06496bdd43a5984e7e9739d5b3ee",
      "c42c4f391f6e43f982d41d6a7b1ed696"
     ]
    },
    "id": "w84le7s5jyY_",
    "outputId": "e6251791-be6c-4d2c-8bab-4488dd41d8b6"
   },
   "execution_count": 17,
   "outputs": [
    {
     "output_type": "display_data",
     "data": {
      "text/plain": [
       "model-00001-of-00003.safetensors:   0%|          | 0.00/4.91G [00:00<?, ?B/s]"
      ],
      "application/vnd.jupyter.widget-view+json": {
       "version_major": 2,
       "version_minor": 0,
       "model_id": "6679c7b5c0d740789f4e788e05c226b1"
      }
     },
     "metadata": {}
    },
    {
     "output_type": "display_data",
     "data": {
      "text/plain": [
       "model-00003-of-00003.safetensors:   0%|          | 0.00/134M [00:00<?, ?B/s]"
      ],
      "application/vnd.jupyter.widget-view+json": {
       "version_major": 2,
       "version_minor": 0,
       "model_id": "5d190d0a6b82450ea44af79a2a0bd281"
      }
     },
     "metadata": {}
    },
    {
     "output_type": "display_data",
     "data": {
      "text/plain": [
       "model-00002-of-00003.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]"
      ],
      "application/vnd.jupyter.widget-view+json": {
       "version_major": 2,
       "version_minor": 0,
       "model_id": "a49246e53bd24c1a9f7cc6cd29422d27"
      }
     },
     "metadata": {}
    },
    {
     "output_type": "display_data",
     "data": {
      "text/plain": [
       "Upload 3 LFS files:   0%|          | 0/3 [00:00<?, ?it/s]"
      ],
      "application/vnd.jupyter.widget-view+json": {
       "version_major": 2,
       "version_minor": 0,
       "model_id": "103d2f47bd3041dfadfe51c4efbde30d"
      }
     },
     "metadata": {}
    },
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "CommitInfo(commit_url='https://huggingface.co/windmaple/gemma-2-finetuned-model-xtuner/commit/173d08858a594ed07939f68abb9050f6ceccdf61', commit_message='Upload model', commit_description='', oid='173d08858a594ed07939f68abb9050f6ceccdf61', pr_url=None, pr_revision=None, pr_num=None)"
      ],
      "application/vnd.google.colaboratory.intrinsic+json": {
       "type": "string"
      }
     },
     "metadata": {},
     "execution_count": 17
    }
   ]
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Conclusion\n",
    "\n",
    "This notebook demonstrates how to use XTuner to do instruction tuning for the Gemma 2B IT model. If you want to finetune with another dataset, please check out the XTuner documentation on how to [prepare your own dataset](https://github.com/InternLM/xtuner/blob/main/docs/en/user_guides/dataset_prepare.md)."
   ],
   "metadata": {
    "id": "upvIpXvBgbve"
   }
  }
 ],
 "metadata": {
  "colab": {
   "provenance": [],
   "gpuType": "A100",
   "machine_shape": "hm"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "name": "python3"
  },
  "accelerator": "GPU",
  "widgets": {
   "application/vnd.jupyter.widget-state+json": {
    "28bd715d5b86416f94b948b289cf65a4": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "HBoxModel",
     "model_module_version": "1.5.0",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "HBoxModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "HBoxView",
      "box_style": "",
      "children": [
       "IPY_MODEL_9d043e76f08248e09ca3759f86be2c5d",
       "IPY_MODEL_1012c0b13725469bb4e73b95f3e4b98b",
       "IPY_MODEL_7b1f895c17f5458bb2893f2fc9750e10"
      ],
      "layout": "IPY_MODEL_3bee3e7e8cd24fb7b35c9a38f5c6af87"
     }
    },
    "9d043e76f08248e09ca3759f86be2c5d": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "HTMLModel",
     "model_module_version": "1.5.0",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "HTMLModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "HTMLView",
      "description": "",
      "description_tooltip": null,
      "layout": "IPY_MODEL_2ea7b49489a2499db681ab07487ec90c",
      "placeholder": "​",
      "style": "IPY_MODEL_5b0431b5435347708c476916ab7e71e9",
      "value": "Loading checkpoint shards: 100%"
     }
    },
    "1012c0b13725469bb4e73b95f3e4b98b": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "FloatProgressModel",
     "model_module_version": "1.5.0",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "FloatProgressModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "ProgressView",
      "bar_style": "success",
      "description": "",
      "description_tooltip": null,
      "layout": "IPY_MODEL_a07543f015fd4caeb69751c583ef7efe",
      "max": 3,
      "min": 0,
      "orientation": "horizontal",
      "style": "IPY_MODEL_8ed186c45d494416beb846deaca8a1c6",
      "value": 3
     }
    },
    "7b1f895c17f5458bb2893f2fc9750e10": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "HTMLModel",
     "model_module_version": "1.5.0",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "HTMLModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "HTMLView",
      "description": "",
      "description_tooltip": null,
      "layout": "IPY_MODEL_f0c85660507b49ad89897942cbeb309e",
      "placeholder": "​",
      "style": "IPY_MODEL_a31aea533678414daf932d5c642ea28b",
      "value": " 3/3 [00:01&lt;00:00,  1.83it/s]"
     }
    },
    "3bee3e7e8cd24fb7b35c9a38f5c6af87": {
     "model_module": "@jupyter-widgets/base",
     "model_name": "LayoutModel",
     "model_module_version": "1.2.0",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "2ea7b49489a2499db681ab07487ec90c": {
     "model_module": "@jupyter-widgets/base",
     "model_name": "LayoutModel",
     "model_module_version": "1.2.0",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "5b0431b5435347708c476916ab7e71e9": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "DescriptionStyleModel",
     "model_module_version": "1.5.0",
     "state": {
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "DescriptionStyleModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "StyleView",
      "description_width": ""
     }
    },
    "a07543f015fd4caeb69751c583ef7efe": {
     "model_module": "@jupyter-widgets/base",
     "model_name": "LayoutModel",
     "model_module_version": "1.2.0",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "8ed186c45d494416beb846deaca8a1c6": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "ProgressStyleModel",
     "model_module_version": "1.5.0",
     "state": {
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "ProgressStyleModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "StyleView",
      "bar_color": null,
      "description_width": ""
     }
    },
    "f0c85660507b49ad89897942cbeb309e": {
     "model_module": "@jupyter-widgets/base",
     "model_name": "LayoutModel",
     "model_module_version": "1.2.0",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "a31aea533678414daf932d5c642ea28b": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "DescriptionStyleModel",
     "model_module_version": "1.5.0",
     "state": {
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "DescriptionStyleModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "StyleView",
      "description_width": ""
     }
    },
    "6679c7b5c0d740789f4e788e05c226b1": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "HBoxModel",
     "model_module_version": "1.5.0",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "HBoxModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "HBoxView",
      "box_style": "",
      "children": [
       "IPY_MODEL_029a74ee07854f58a756ef1ffdbbb44a",
       "IPY_MODEL_cf1d908c37e840c8a184c7a2beea21c7",
       "IPY_MODEL_70e754ce7e724242b97e0981ab83d504"
      ],
      "layout": "IPY_MODEL_493ec5486f4743b8a3306fbc88a1fdb2"
     }
    },
    "029a74ee07854f58a756ef1ffdbbb44a": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "HTMLModel",
     "model_module_version": "1.5.0",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "HTMLModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "HTMLView",
      "description": "",
      "description_tooltip": null,
      "layout": "IPY_MODEL_615d0bcf2dcb46118e9e0832c9f56c48",
      "placeholder": "​",
      "style": "IPY_MODEL_6f19265460464361a394254ecaf7040a",
      "value": "model-00001-of-00003.safetensors: 100%"
     }
    },
    "cf1d908c37e840c8a184c7a2beea21c7": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "FloatProgressModel",
     "model_module_version": "1.5.0",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "FloatProgressModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "ProgressView",
      "bar_style": "success",
      "description": "",
      "description_tooltip": null,
      "layout": "IPY_MODEL_e61f494a85644d74a76ea0495a6b03d0",
      "max": 4911634832,
      "min": 0,
      "orientation": "horizontal",
      "style": "IPY_MODEL_bff8375e4261431fad75e284d71de513",
      "value": 4911634832
     }
    },
    "70e754ce7e724242b97e0981ab83d504": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "HTMLModel",
     "model_module_version": "1.5.0",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "HTMLModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "HTMLView",
      "description": "",
      "description_tooltip": null,
      "layout": "IPY_MODEL_e1873b49271b498c9eb0ac72a217117f",
      "placeholder": "​",
      "style": "IPY_MODEL_745f3bd1ffdf463a8f188c170c902f3d",
      "value": " 4.91G/4.91G [01:42&lt;00:00, 40.3MB/s]"
     }
    },
    "493ec5486f4743b8a3306fbc88a1fdb2": {
     "model_module": "@jupyter-widgets/base",
     "model_name": "LayoutModel",
     "model_module_version": "1.2.0",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "615d0bcf2dcb46118e9e0832c9f56c48": {
     "model_module": "@jupyter-widgets/base",
     "model_name": "LayoutModel",
     "model_module_version": "1.2.0",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "6f19265460464361a394254ecaf7040a": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "DescriptionStyleModel",
     "model_module_version": "1.5.0",
     "state": {
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "DescriptionStyleModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "StyleView",
      "description_width": ""
     }
    },
    "e61f494a85644d74a76ea0495a6b03d0": {
     "model_module": "@jupyter-widgets/base",
     "model_name": "LayoutModel",
     "model_module_version": "1.2.0",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "bff8375e4261431fad75e284d71de513": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "ProgressStyleModel",
     "model_module_version": "1.5.0",
     "state": {
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "ProgressStyleModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "StyleView",
      "bar_color": null,
      "description_width": ""
     }
    },
    "e1873b49271b498c9eb0ac72a217117f": {
     "model_module": "@jupyter-widgets/base",
     "model_name": "LayoutModel",
     "model_module_version": "1.2.0",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "745f3bd1ffdf463a8f188c170c902f3d": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "DescriptionStyleModel",
     "model_module_version": "1.5.0",
     "state": {
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "DescriptionStyleModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "StyleView",
      "description_width": ""
     }
    },
    "5d190d0a6b82450ea44af79a2a0bd281": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "HBoxModel",
     "model_module_version": "1.5.0",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "HBoxModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "HBoxView",
      "box_style": "",
      "children": [
       "IPY_MODEL_1d285611daa4489494f70d8f39d0ea6f",
       "IPY_MODEL_4b3d654744d24514b03e0cdea8424404",
       "IPY_MODEL_ffd08b90599c4460bc24990c0cb08759"
      ],
      "layout": "IPY_MODEL_2706a19741d24ddf9954dd09501661db"
     }
    },
    "1d285611daa4489494f70d8f39d0ea6f": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "HTMLModel",
     "model_module_version": "1.5.0",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "HTMLModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "HTMLView",
      "description": "",
      "description_tooltip": null,
      "layout": "IPY_MODEL_8fc9ffd59440469e9e6e457beb99f7f8",
      "placeholder": "​",
      "style": "IPY_MODEL_7237a15df203473a826c3046542a2de4",
      "value": "model-00003-of-00003.safetensors: 100%"
     }
    },
    "4b3d654744d24514b03e0cdea8424404": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "FloatProgressModel",
     "model_module_version": "1.5.0",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "FloatProgressModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "ProgressView",
      "bar_style": "success",
      "description": "",
      "description_tooltip": null,
      "layout": "IPY_MODEL_0c3bf85907ea468599d8def54f7b8824",
      "max": 134242736,
      "min": 0,
      "orientation": "horizontal",
      "style": "IPY_MODEL_be283a5eb29b4e7899a0007c31c9fbce",
      "value": 134242736
     }
    },
    "ffd08b90599c4460bc24990c0cb08759": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "HTMLModel",
     "model_module_version": "1.5.0",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "HTMLModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "HTMLView",
      "description": "",
      "description_tooltip": null,
      "layout": "IPY_MODEL_f0710f43b0fd4fd9b2441130236b94a5",
      "placeholder": "​",
      "style": "IPY_MODEL_ced7492573274d9980517f71da4deb53",
      "value": " 134M/134M [00:03&lt;00:00, 35.2MB/s]"
     }
    },
    "2706a19741d24ddf9954dd09501661db": {
     "model_module": "@jupyter-widgets/base",
     "model_name": "LayoutModel",
     "model_module_version": "1.2.0",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "8fc9ffd59440469e9e6e457beb99f7f8": {
     "model_module": "@jupyter-widgets/base",
     "model_name": "LayoutModel",
     "model_module_version": "1.2.0",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "7237a15df203473a826c3046542a2de4": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "DescriptionStyleModel",
     "model_module_version": "1.5.0",
     "state": {
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "DescriptionStyleModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "StyleView",
      "description_width": ""
     }
    },
    "0c3bf85907ea468599d8def54f7b8824": {
     "model_module": "@jupyter-widgets/base",
     "model_name": "LayoutModel",
     "model_module_version": "1.2.0",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "be283a5eb29b4e7899a0007c31c9fbce": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "ProgressStyleModel",
     "model_module_version": "1.5.0",
     "state": {
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "ProgressStyleModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "StyleView",
      "bar_color": null,
      "description_width": ""
     }
    },
    "f0710f43b0fd4fd9b2441130236b94a5": {
     "model_module": "@jupyter-widgets/base",
     "model_name": "LayoutModel",
     "model_module_version": "1.2.0",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "ced7492573274d9980517f71da4deb53": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "DescriptionStyleModel",
     "model_module_version": "1.5.0",
     "state": {
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "DescriptionStyleModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "StyleView",
      "description_width": ""
     }
    },
    "a49246e53bd24c1a9f7cc6cd29422d27": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "HBoxModel",
     "model_module_version": "1.5.0",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "HBoxModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "HBoxView",
      "box_style": "",
      "children": [
       "IPY_MODEL_709426f862f540fea008aea04d807aa3",
       "IPY_MODEL_d3f29a6b96cf4a609055f6f442c9a072",
       "IPY_MODEL_2ccdd27f81074840ac9370f3da745bb3"
      ],
      "layout": "IPY_MODEL_651b5ce3e22d408c8ec05178ad223a55"
     }
    },
    "709426f862f540fea008aea04d807aa3": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "HTMLModel",
     "model_module_version": "1.5.0",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "HTMLModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "HTMLView",
      "description": "",
      "description_tooltip": null,
      "layout": "IPY_MODEL_6fe591f841c549dcaecae4c8bb113393",
      "placeholder": "​",
      "style": "IPY_MODEL_ebe7ba399b9744fcaebd2bf1b6e1a4ad",
      "value": "model-00002-of-00003.safetensors: 100%"
     }
    },
    "d3f29a6b96cf4a609055f6f442c9a072": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "FloatProgressModel",
     "model_module_version": "1.5.0",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "FloatProgressModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "ProgressView",
      "bar_style": "success",
      "description": "",
      "description_tooltip": null,
      "layout": "IPY_MODEL_c23ac33e5f0547d685eaa76eab4227bf",
      "max": 4978829984,
      "min": 0,
      "orientation": "horizontal",
      "style": "IPY_MODEL_e6fe7fb90179412fa06376a028d37359",
      "value": 4978829984
     }
    },
    "2ccdd27f81074840ac9370f3da745bb3": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "HTMLModel",
     "model_module_version": "1.5.0",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "HTMLModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "HTMLView",
      "description": "",
      "description_tooltip": null,
      "layout": "IPY_MODEL_978e5e05d9964df48a842a5035065a28",
      "placeholder": "​",
      "style": "IPY_MODEL_a6fa33b016214472b2bc08d9ed29a6cd",
      "value": " 4.98G/4.98G [01:47&lt;00:00, 48.7MB/s]"
     }
    },
    "651b5ce3e22d408c8ec05178ad223a55": {
     "model_module": "@jupyter-widgets/base",
     "model_name": "LayoutModel",
     "model_module_version": "1.2.0",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "6fe591f841c549dcaecae4c8bb113393": {
     "model_module": "@jupyter-widgets/base",
     "model_name": "LayoutModel",
     "model_module_version": "1.2.0",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "ebe7ba399b9744fcaebd2bf1b6e1a4ad": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "DescriptionStyleModel",
     "model_module_version": "1.5.0",
     "state": {
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "DescriptionStyleModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "StyleView",
      "description_width": ""
     }
    },
    "c23ac33e5f0547d685eaa76eab4227bf": {
     "model_module": "@jupyter-widgets/base",
     "model_name": "LayoutModel",
     "model_module_version": "1.2.0",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "e6fe7fb90179412fa06376a028d37359": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "ProgressStyleModel",
     "model_module_version": "1.5.0",
     "state": {
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "ProgressStyleModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "StyleView",
      "bar_color": null,
      "description_width": ""
     }
    },
    "978e5e05d9964df48a842a5035065a28": {
     "model_module": "@jupyter-widgets/base",
     "model_name": "LayoutModel",
     "model_module_version": "1.2.0",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "a6fa33b016214472b2bc08d9ed29a6cd": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "DescriptionStyleModel",
     "model_module_version": "1.5.0",
     "state": {
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "DescriptionStyleModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "StyleView",
      "description_width": ""
     }
    },
    "103d2f47bd3041dfadfe51c4efbde30d": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "HBoxModel",
     "model_module_version": "1.5.0",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "HBoxModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "HBoxView",
      "box_style": "",
      "children": [
       "IPY_MODEL_28ded162c1af4358b72ec6bbb11eaa4b",
       "IPY_MODEL_3f1c07c695e34c22b8a4fe812be01e5c",
       "IPY_MODEL_c52cec21832946e889cca5062248ae07"
      ],
      "layout": "IPY_MODEL_68438ea2108f41d58ff03c0cdf0c32a3"
     }
    },
    "28ded162c1af4358b72ec6bbb11eaa4b": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "HTMLModel",
     "model_module_version": "1.5.0",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "HTMLModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "HTMLView",
      "description": "",
      "description_tooltip": null,
      "layout": "IPY_MODEL_3b20ea6dddf8469f92ffb78fc9c7c9c9",
      "placeholder": "​",
      "style": "IPY_MODEL_0fdbd39d405947e9800038e9b8cd677d",
      "value": "Upload 3 LFS files: 100%"
     }
    },
    "3f1c07c695e34c22b8a4fe812be01e5c": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "FloatProgressModel",
     "model_module_version": "1.5.0",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "FloatProgressModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "ProgressView",
      "bar_style": "success",
      "description": "",
      "description_tooltip": null,
      "layout": "IPY_MODEL_a0054d9a3f414faa8f8164bb73085c8f",
      "max": 3,
      "min": 0,
      "orientation": "horizontal",
      "style": "IPY_MODEL_6769c74148654cab8897d8bb8a1debc2",
      "value": 3
     }
    },
    "c52cec21832946e889cca5062248ae07": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "HTMLModel",
     "model_module_version": "1.5.0",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "HTMLModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "HTMLView",
      "description": "",
      "description_tooltip": null,
      "layout": "IPY_MODEL_5aca06496bdd43a5984e7e9739d5b3ee",
      "placeholder": "​",
      "style": "IPY_MODEL_c42c4f391f6e43f982d41d6a7b1ed696",
      "value": " 3/3 [01:47&lt;00:00, 34.70s/it]"
     }
    },
    "68438ea2108f41d58ff03c0cdf0c32a3": {
     "model_module": "@jupyter-widgets/base",
     "model_name": "LayoutModel",
     "model_module_version": "1.2.0",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "3b20ea6dddf8469f92ffb78fc9c7c9c9": {
     "model_module": "@jupyter-widgets/base",
     "model_name": "LayoutModel",
     "model_module_version": "1.2.0",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "0fdbd39d405947e9800038e9b8cd677d": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "DescriptionStyleModel",
     "model_module_version": "1.5.0",
     "state": {
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "DescriptionStyleModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "StyleView",
      "description_width": ""
     }
    },
    "a0054d9a3f414faa8f8164bb73085c8f": {
     "model_module": "@jupyter-widgets/base",
     "model_name": "LayoutModel",
     "model_module_version": "1.2.0",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "6769c74148654cab8897d8bb8a1debc2": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "ProgressStyleModel",
     "model_module_version": "1.5.0",
     "state": {
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "ProgressStyleModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "StyleView",
      "bar_color": null,
      "description_width": ""
     }
    },
    "5aca06496bdd43a5984e7e9739d5b3ee": {
     "model_module": "@jupyter-widgets/base",
     "model_name": "LayoutModel",
     "model_module_version": "1.2.0",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "c42c4f391f6e43f982d41d6a7b1ed696": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "DescriptionStyleModel",
     "model_module_version": "1.5.0",
     "state": {
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "DescriptionStyleModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "StyleView",
      "description_width": ""
     }
    }
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}